[Python-Dev] Handle errors in cleanup code

Tue Jun 13 00:10:05 EDT 2017

On Mon, Jun 12, 2017 at 1:07 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Aye, agreed. The key challenge we have is that we're trying to
> represent the exception state as a linked list, when what we really
> have once we start taking cleanup errors into account is an exception
> *tree*.
[...]
> P.S. trio's MultiError is also likely worth looking into in this context

Huh, yeah, this is some interesting convergent evolution.
trio.MultiError is exactly a way of representing a tree of exceptions,
though it's designed to do that for "live" exceptions rather than just
context chaining.

Briefly... the motivation here is that if you have multiple concurrent
call-stacks (threads/tasks/whatever-your-abstraction-is-called)
running at the same time, then it means you can literally have
multiple uncaught exceptions propagating out at the same time, so you
need some strategy for dealing with that. One popular option is to
avoid the problem by throwing away exceptions that propagate "too far"
(e.g., in the stdlib threading module, if an exception hits the top of
the call stack of a non-main thread, then it gets printed to stderr
and then the program continues normally). Trio takes a different
approach: its tasks are arranged in a tree, and if a child task exits
with an exception then that exception gets propagated into the parent
task. This allows us avoid throwing away exceptions, but it means that
we need some way to represent the situation when a parent task has
*multiple* live exceptions propagate into it at the same time. That's
what trio.MultiError is for.

Structurally, MultiError is just an Exception that holds a list of
child exceptions, like

   MultiError([TypeError(), OSError(), KeyboardInterrupt()])

so that they can propagate together. One design decision though is
that if you put a MultiError inside a MultiError, it *isn't*
collapsed, so it's also legal to have something like

    MultiError([
        TypeError(),
        MultiError([OSError(), KeyboardInterrupt()]),
    ])

Semantically, these two MultiErrors are mostly the same; they both
represent a situation where there are 3 unhandled exceptions
propagating together. The reason for keeping the tree structure is
that if the inner MultiError propagated for a while on its own before
meeting the TypeError, then it accumulated some traceback and we need
somewhere to store that information. (This generally happens when the
task tree has multiple levels of nesting.) The other option would be
to make two copies of this part of the traceback and attach one copy
onto each of the two exceptions inside it, (a) but that's potentially
expensive, and (b) if we eventually print the traceback then it's much
more readable if we can say "here's where OSError was raised, and
where KeyboardInterrupt was raised, and here's where they traveled
together" and only print the common frames once.

There's some examples of how this works on pages 38-49 of my language
summit slides here:
https://vorpus.org/~njs/misc/trio-language-summit-2017.pdf
And here's the source for the toy example programs that I used in the
slides, in case anyone wants to play with them:
https://gist.github.com/njsmith/634b596e5765d5ed8b819a4f8e56d306

Then the other piece of the MultiError design is catching them. This
is done with a context manager called MultiError.catch, which "maps"
an exception handler (represented as a callable) over all the
exceptions that propagate through it, and then simplifies the
MultiError tree to collapse empty and singleton nodes. Since the
exceptions inside a MultiError represent independent, potentially
unrelated errors, you definitely don't want to accidentally throw away
that KeyboardInterrupt just because your code knows how to handle the
OSError. Or if you have something like MultiError([OSError(),
TypeError()]) then trio has no idea which of those exceptions might be
the error you know how to catch and handle which might be the error
that indicates some terrible bug that should abort the program, so
neither is allowed to mask the other - you have to decide how to
handle them independently.

If anyone wants to dig into it the code is here:
https://github.com/python-trio/trio/blob/master/trio/_core/_multierror.py

Anyway, compared to the __cleanup_errors__ idea:

- Both involve a collection object that holds exceptions, but in the
MultiError version the collection subclasses BaseException. One
consequence is that you can put the exception collection object
directly into __context__ or __cause__ instead of using a new
attribute.

- MultiError allows for a tree structure *within* a single collection
of parallel exceptions. (And then of course on top of that each
individual exception within the collection can have its own chain
attached.) Since the motivation for this is wanting to preserve
traceback structure accumulated while the collection was propagating,
and __cleanup_errors__ is only intended for "dead" expections that
don't propagate, this is solving an issue that __cleanup_errors__
doesn't have.

- OTOH, it's not clear to me that you *want* to always stick cleanup
errors into a __context__-like attribute where they'll be mostly
ignored. Forcing the developer to guess ahead of time whether it's the
original error or the cleanup errors that are most important seems
pretty, well, error-prone. Maybe it would be more useful to have a
version of ExitStack that collects up the errors from inside the block
and from cleanup handlers and then raises them all together as a
MultiError. (If nothing else, this would let you avoid having to guess
which exceptions are more important than others, like you mention with
your reraise() idea for trying to prioritize KeyboardInterrupt over
other exceptions.)

> Since I don't see anything in the discussion so far that *requires*
> changes to the standard library (aside from "we may want to use this
> ourselves"), the right place to thrash out the design details is so
> probably contextlib2: https://github.com/jazzband/contextlib2
>
> That's where contextlib.ExitStack was born, and I prefer using it to
> iterate on context management design concepts, since we can push
> updates out faster, and if we make bad choices anywhere along the way,
> they can just sit around in contextlib2, rather than polluting the
> standard library indefinitely.

I'd also be open to extracting MultiError into a standalone library
that trio and contextlib2 both consume, if there was interest in going
that way.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org