[Tutor] properly propagate problems

Sat Mar 23 19:03:18 EDT 2019

On 23Mar2019 11:04, ingo janssen <ingoogni at gmail.com> wrote:
>One thing I often struggle with is how to deal with exceptions, 
>especially when I have a chain of functions that use each others 
>output and/or long running processes. As the answer will probably be 
>"it depends"

Oh yes!

The core rule of thumb is "don't catch an exception which you don't know 
how to handle", but that is for truly unexpected errors not envisaged by 
the programmer. Then your programme aborts with a debugging stack trace.

Your situation below is more nuanced. Discussion below.

>take for example this program flow:
>
>open a file and read into BytesIO buffer
>get a FTP connection from pool
>send buffer to plantuml.jar in memory FTP server
>render file to image
>get image from FTP server
>push the image onto CherryPy bus
>push (SSE) the image to web browser
>
>def read_file(input_file):
>    try:
>        with open(input_file, 'rb') as f:
>            buffer = io.BytesIO(f.read())
>    except FileNotFoundError as e:
>        print(e)
>        ....
>    return buffer
>
>assume the file is not found, I cannot just kill the whole process. 
>Catching the exception is one thing, but how to deal with it properly, 
>I have to inform the client somehow what went wrong.

Given a function like that, I would be inclined to do one of 2 things:

A) don't make a policy decision (catching the exception) this close to 
the failure, instead let the exception out and let the caller handle it:

    def read_file(input_file):
        with open(input_file, 'rb') as f:
            return io.BytesIO(f.read())

    filename = "foo"
    try:
        buffer = read_file(filename)
    except OSError as e:
        error("could not load %r: %s", filename, e)
        ... failure action, maybe return from the function ...
    ... proceed with buffer ...

This leaves the policy decision with the calling code, which may have a 
better idea about what is suitable. For example, you might pass some 
useful response to your web client here. The low level function 
read_file() doesn't know that it is part of a web service.

The handy thing about exceptions is that you can push that policy 
decision quite a long way out. Provided the outer layer where you decide 
to catch the exception knows that this involved accessing a file you can 
put that try/except quite a long way out and still produce a sensible 
looking error response.

Also, the further out the policy try/except lives, the simpler the inner 
functions can be because they don't need to handle failure - they can be 
written for success provided that failures raise exceptions, making them 
_much_ simpler and easier to maintain. And with far fewer policy 
decisions!

The flip side to this is that there is a limit to how far out in the 
call chain this try/except can sensibly happen: if you're far enough out 
that the catching code _doesn't_ know that there was a file read 
involved, the error message becomes more vague (although you still have 
the exception instance itself with the low level detail).

B) to return None on failure:

    def read_file(input_file):
        try:
            with open(input_file, 'rb') as f:
                return io.BytesIO(f.read())
        except OSError as e:
            error(
                "read_file(%r): could not read input file: %s", 
                input_file, e)
            return None

None is a useful sentinel value for failure. Note that sometimes you 
will want something else if None is meaningful return value in ordinary 
circumstances. Then your calling code can handle this without 
exceptions:

    buffer = read_file("foo")
    if buffer is None:
        ... return nice message to web client ...
    else:
        ... process the image ...

However, it does mean that this handling has to happen right at the call 
to read_file. That can be fine, but might be inconvenient.

Finally, some related points:

I find it useful to distinguish "mechanism" and "policy". In my ideal 
world a programme is at least 90% mechanism with a thin layer of policy 
outside it. Here "policy" is what might be termed "business logic" or 
"application logic" in some circumstances: what to do to achieve the 
high level goal. The high level is where you decide how to behave in 
various circumstances.

This has a few advantages: almost all low level code is mechanism: it 
has a well defined, usually simple, purpose. By having almost all 
failures raise an exception you can make the low level functions very 
simple: do A then B then C until success, where you return the result; 
raise exceptions when things go wrong (failure to open files, invalid 
input parameters, what have you). This produces what I tend to call 
"white list" code: code which only returns a result when all the 
required operations succeed.

This is option (A) above, and makes for very simple inner functions.

For option (B) "return None on failure", this is where we decide that 
specific failures are in fact valid execution paths, and None is a valid 
function return, indicating some kind of null result. You might still 
raise exceptions of various types for invalid input in this case; the 
None is only for a well defined expected non-answer.

Regarding uncaught exceptions:

As you say, you don't want your whole app to abort. So while you may 
catch specific exception types at some inner layer, you might want to 
catch _all_ exceptions at the very outermost layer and log them (with a 
stack trace), but not abort. So:

    try:
        ... process client request ...
    except Exception as e:
        # log exception and stack trace to the application log
        error("handler failed: %s", e, exc_info=True)
        return 500 series web response to client here ...

This is one of those situaions where you might use the normally reviled 
"catch all exceptions" anti-pattern: at the outermost layer of some kind 
of service programme such as a daemon or web app handling requests: 
report the exception and carry on with the application. Remember the 
Zen: errors should not pass silently. Always log something when you 
catch an exception.

Note that a primary reason to hate "catch all" is that such code often 
then proceeds to do more work with the bogus results. In a daemon or a 
web app, you're aborting _that request_. Any further work is shiny and 
new from a new request, not continuing with nonsensical data left around 
by a catch-all.

Fortunately web frameworks like Flask or CherryPy usually embed such a 
catch-everything in their handler logic, outside you own code (after 
all, what if you own catch-everything was buggy?) So you don't normally 
need to write one of these things yourself. Which is good really, most 
of the time - they are a recipe for accidentally hiding errors. Let the 
framework do that one - it has been debugged for you.

Another issue is the distinction between what to log and what to show 
the client. You usually DO NOT want to let the nitty gritty of the 
exception get to the end user: that way lies accidental leaking of 
credentials or private implementation details. So log details, but 
return fairly bland information to the client. Try to write your code so 
that this is the default behaviour. Again, web frameworks generally do 
just this in their outermost catch-all handler: only if you turn on some 
kind of DEBUG mode does it splurge private stuff over the web page for 
ease of debugging in development.

Finally, I'm sure you've thought to yourself: if I catch an exception a 
long way from where it happened, won't the exception message lack all 
sorts of useful context about what happened? How useful is a log entry 
like this (from the outermost "OCR the document" level):

    error("OCR failed: %s", e)

producing:

    OCR failed: permission denied

because of a permission issue on a specific (but here, unnamed) file?

My own solution to this issue is my cs.pfx module (you can install this 
with "pip install cs.pfx").

This provides a context manager named Pfx which adorns exceptions with 
call stack information, totally under your control. It also has various 
.error and .warning etc methods which produce prefixed log messages.

Example:

    from cs.pfx import Pfx

    def read_file(input_file):
        with Pfx("read_file(%r)", input_file):
            with open(input_file, 'rb') as f:
                return io.BytesIO(f.read())

and outer calls might look like:

    def produce_image(image_name):
        with Pfx("produce_image(%r)", image_name):
            filename = path_to_image_file(image_name)
            buffer = read_file(filename)
            ... do stuff with the buffer ...

If the inner open fails, the exception message, which is originally like 
this:

    [Errno 2] No such file or directory: 'fffff'

becomes:

    produce_image('image_name'): read_file("/path/to/image_name.png"): [Errno 2] No such file or directory: '/path/to/image_name.png'

How much context you get depends on where you put the "with Pfx(...):" 
statements.

It also furthers simple code, because you no longer need to pepper your 
own exceptions with annoying repetitive context, just the core message:

    def read_file(input_file):
        with Pfx("read_file(%r)", input_file):
            if not input_file.startswith('/'):
                raise ValueError("must be an absolute path")
            with open(input_file, 'rb') as f:
                return io.BytesIO(f.read())

Because of the Pfx the ValueError gets the input_file value in question 
prefixed automatically, so you don't need to include it in your raise 
statement.

Hoping all this helps.

Short takeaway: decide what's mechanism and what is policy, and try to 
put policy further out in higher level code.

Cheers,
Cameron Simpson <cs at cskk.id.au>

Go not to the elves for counsel, for they will say both no and yes.
- Frodo, The Fellowship of the Ring