[Python-Dev] Handle errors in cleanup code (original) (raw)

Nathaniel Smith njs at pobox.com
Tue Jun 13 00:10:05 EDT 2017

Previous message (by thread): [Python-Dev] Handle errors in cleanup code
Next message (by thread): [Python-Dev] Handle errors in cleanup code
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jun 12, 2017 at 1:07 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

Aye, agreed. The key challenge we have is that we're trying to represent the exception state as a linked list, when what we really have once we start taking cleanup errors into account is an exception tree. [...] P.S. trio's MultiError is also likely worth looking into in this context

Huh, yeah, this is some interesting convergent evolution. trio.MultiError is exactly a way of representing a tree of exceptions, though it's designed to do that for "live" exceptions rather than just context chaining.

Briefly... the motivation here is that if you have multiple concurrent call-stacks (threads/tasks/whatever-your-abstraction-is-called) running at the same time, then it means you can literally have multiple uncaught exceptions propagating out at the same time, so you need some strategy for dealing with that. One popular option is to avoid the problem by throwing away exceptions that propagate "too far" (e.g., in the stdlib threading module, if an exception hits the top of the call stack of a non-main thread, then it gets printed to stderr and then the program continues normally). Trio takes a different approach: its tasks are arranged in a tree, and if a child task exits with an exception then that exception gets propagated into the parent task. This allows us avoid throwing away exceptions, but it means that we need some way to represent the situation when a parent task has multiple live exceptions propagate into it at the same time. That's what trio.MultiError is for.

Structurally, MultiError is just an Exception that holds a list of child exceptions, like

MultiError([TypeError(), OSError(), KeyboardInterrupt()])

so that they can propagate together. One design decision though is that if you put a MultiError inside a MultiError, it isn't collapsed, so it's also legal to have something like

MultiError([
    TypeError(),
    MultiError([OSError(), KeyboardInterrupt()]),
])

Semantically, these two MultiErrors are mostly the same; they both represent a situation where there are 3 unhandled exceptions propagating together. The reason for keeping the tree structure is that if the inner MultiError propagated for a while on its own before meeting the TypeError, then it accumulated some traceback and we need somewhere to store that information. (This generally happens when the task tree has multiple levels of nesting.) The other option would be to make two copies of this part of the traceback and attach one copy onto each of the two exceptions inside it, (a) but that's potentially expensive, and (b) if we eventually print the traceback then it's much more readable if we can say "here's where OSError was raised, and where KeyboardInterrupt was raised, and here's where they traveled together" and only print the common frames once.

There's some examples of how this works on pages 38-49 of my language summit slides here: https://vorpus.org/~njs/misc/trio-language-summit-2017.pdf And here's the source for the toy example programs that I used in the slides, in case anyone wants to play with them: https://gist.github.com/njsmith/634b596e5765d5ed8b819a4f8e56d306

Then the other piece of the MultiError design is catching them. This is done with a context manager called MultiError.catch, which "maps" an exception handler (represented as a callable) over all the exceptions that propagate through it, and then simplifies the MultiError tree to collapse empty and singleton nodes. Since the exceptions inside a MultiError represent independent, potentially unrelated errors, you definitely don't want to accidentally throw away that KeyboardInterrupt just because your code knows how to handle the OSError. Or if you have something like MultiError([OSError(), TypeError()]) then trio has no idea which of those exceptions might be the error you know how to catch and handle which might be the error that indicates some terrible bug that should abort the program, so neither is allowed to mask the other - you have to decide how to handle them independently.

If anyone wants to dig into it the code is here: https://github.com/python-trio/trio/blob/master/trio/_core/_multierror.py

Anyway, compared to the cleanup_errors idea:

Both involve a collection object that holds exceptions, but in the MultiError version the collection subclasses BaseException. One consequence is that you can put the exception collection object directly into context or cause instead of using a new attribute.
MultiError allows for a tree structure within a single collection of parallel exceptions. (And then of course on top of that each individual exception within the collection can have its own chain attached.) Since the motivation for this is wanting to preserve traceback structure accumulated while the collection was propagating, and cleanup_errors is only intended for "dead" expections that don't propagate, this is solving an issue that cleanup_errors doesn't have.
OTOH, it's not clear to me that you want to always stick cleanup errors into a context-like attribute where they'll be mostly ignored. Forcing the developer to guess ahead of time whether it's the original error or the cleanup errors that are most important seems pretty, well, error-prone. Maybe it would be more useful to have a version of ExitStack that collects up the errors from inside the block and from cleanup handlers and then raises them all together as a MultiError. (If nothing else, this would let you avoid having to guess which exceptions are more important than others, like you mention with your reraise() idea for trying to prioritize KeyboardInterrupt over other exceptions.)

Since I don't see anything in the discussion so far that requires changes to the standard library (aside from "we may want to use this ourselves"), the right place to thrash out the design details is so probably contextlib2: https://github.com/jazzband/contextlib2

That's where contextlib.ExitStack was born, and I prefer using it to iterate on context management design concepts, since we can push updates out faster, and if we make bad choices anywhere along the way, they can just sit around in contextlib2, rather than polluting the standard library indefinitely.

I'd also be open to extracting MultiError into a standalone library that trio and contextlib2 both consume, if there was interest in going that way.

-n

-- Nathaniel J. Smith -- https://vorpus.org

Previous message (by thread): [Python-Dev] Handle errors in cleanup code
Next message (by thread): [Python-Dev] Handle errors in cleanup code
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list