[Python-Dev] Fun with 2.3 shutdown (original) (raw)

Tim Peters tim.one at comcast.net
Thu Sep 18 23:06:01 EDT 2003


When the Zope3 tests are run under Python 2.3, after the test runner ends we usually get treated to a long string of these things:

""" Unhandled exception in thread started by Error in sys.excepthook: Original exception was:

"""

All that output is from Python, but it isn't particularly helpful .

Here's what I know:

For Zope readers: these are indeed due to the _Checkpoint threads that are left behind, sitting in

        self._event.wait(self._interval)

self._interval is 120 seconds (i.e., a relatively long time).

For Python readers:

Event.wait() with a timeout ends up in _Condition.wait(), where a lazy busy loop wakes up about 20 times per second to see whether it can proceed.

For some reason an exception is getting raised in the wait() code. I'm not exactly sure what or why yet, but that will come soon enough. For this message, it's enough to know that it happens (an educated guess comes later in this msg).

Python is in Py_Finalize, in its (new in 2.3) second call of PyGC_Collect():

/* Collect garbage.  This may call finalizers; it's nice to
     call these before all modules are destroyed. */
PyGC_Collect();

/* Destroy all modules */
PyImport_Cleanup();

/* Collect final garbage.  This disposes of cycles created by
   new-style class definitions, for example. */
PyGC_Collect();    WE'RE IN THIS ONE

Presumably the _Condition.wait() code wakes up after modules have been destroyed, and raises an exception due to referencing a None'ed-out threading.py global. A fact is that PyGC_Collect has called a del method when the problem occurs, and that's why some other thread was able to get the GIL and proceed (gc.collect() gave control back to Python code, via a Python del method).

Presumably this only causes a visible problem when running "at least most of" the Zope3 tests because the Zope3 tests create a ton of containers and gc takes significant time then. Presumably if just a few tests are run, the second call to PyGC_Collect() goes so fast that it's unlikely the _Condition.wait() code will wake up and blow up before Python exits.

Back to for-sure stuff: the thread does raise an exception, and we wind up back in threadmodule.c t_bootstrap:

if (res == NULL) {   IT IS NULL
    if (PyErr_ExceptionMatches(PyExc_SystemExit))
        PyErr_Clear();
    else {
        PyObject *file;
        PySys_WriteStderr(
            "Unhandled exception in thread started by ");
        file = PySys_GetObject("stderr");
        if (file)
            PyFile_WriteObject(boot->func, file, 0);
        else
            PyObject_Print(boot->func, stderr, 0);
        PySys_WriteStderr("\n");
        PyErr_PrintEx(0);
    }
}

It's not PyExc_SystemExit, so we write

Unhandled exception in thread started by

to C's stderr and then get sys.stderr. Heh. sys has been wiped by this point, and file gets is Py_None. Brrrrrrrr. Everything after this is a comedy of unintended errors. file is true (so far as C is concerned), so we do

    PyFile_WriteObject(boot->func, file, 0);

That tries some tests that don't pass, then does

writer = PyObject_GetAttrString(f, "write");

None.write doesn't exist, so the exception we started with is lost and replaced by an unwanted AttributeError.

PyFile_WriteObject returns -1 then, but t_bootstrap doesn't check for that. Instead it writes a newline, and goes on to

        PyErr_PrintEx(0);

That eventually does

hook = PySys_GetObject("excepthook");

and for a now-familiar reason, that also returns None. Again hook looks true to C, and

    PyObject *result = PyEval_CallObject(hook, args);

overwrites the unwanted AttributeError with a different unwanted exception. PyErr_PrintEx notices that the call fails, and prints

"Error in sys.excepthook:\n"

to C's stderr. It then tries

        PyErr_Display(exception2, v2, tb2);

That starts with

PyObject *f = PySys_GetObject("stderr");

and again that returns None from the ruined sys module, and again f looks true to C. It becomes something of a mess then, so suffice it to say that PyErr_Display's attempts to write to file None via

PyFile_WriteString(buf, f)

don't accomplish anything, and then PyErr_Display() ends with

/* If an error happened here, don't show it.
   XXX This is wrong, but too many callers rely on this behavior. */
if (err != 0)
    PyErr_Clear();

So we come back with no exception set, and PyErr_PrintEx proceeds to write

"\nOriginal exception was:\n"

to C's stderr. Then it tries

        PyErr_Display(exception, v, tb);

and that goes thru the long dance of retrieving None from sys.stderr (etc) all over again.

The best thing for the Zope3 tests would be to find ways not to leave useless daemon threads hanging around.

I'm not sure what it says about Python's shutdown strategy. If I find and fix a None-module-global problem in _Condition.wait(), that's fine and probably helpful (in 2.3.1), but in the context of Zope3 the run() methods of these threads are going to trigger None-module-global problems of their own. That can also be fixed.

So a combination of changes to threading.py and Zope3 code can probably hide the shutdown glitches. Somehow, though, it just doesn't feel entirely satisfactory <0.9 wink>.

seemingly-out-of-gordian-knot-cutters-for-this-week-ly y'rs - tim



More information about the Python-Dev mailing list