[Python-Dev] The recursion checking problem (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Sat Aug 30 22:06:00 CEST 2008

Previous message: [Python-Dev] script containing all opcs
Next message: [Python-Dev] The recursion checking problem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

I was working on a recursion overflow checking bug (http://bugs.python.org/issue2548) and, while I've managed to produce a working patch, I've also become uncomfortable with the very idea of trying to plug all those holes just for the sake of plugging them. I'll try to explain why, by describing the conflicting factors I've identified:

more and more, we are adding calls to Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() all over the interpreter, to avoid special/obscure cases of undetected infinite recursion; this can probably be considered a good thing, but:
after a recursion error has been raised (technically a RuntimeError), usually some code has to do cleanup after noticing the exception; this cleanup now can very easily bump into the recursion limit again, due to the point mentioned above (the funniest example of this is PyErr_ExceptionMatches, which makes a call to PyObject_IsSubclass which itself increases the recursion count because subclasscheck can be recursively invoked...).
to counter the latter problem, py3k has introduced a somewhat smarter mechanism (which I've tracked down to a commit in the defunct p3yk branch by Martin): when the recursion limit is exceeded, a special flag named "overflowed" is set in the thread state structure which disables the primary recursion check, so that cleanup code has a bit of room to increase the recursion count a bit. A secondary recursion check exists (equal to the primary one /plus/ 50) and, if it is reached, the interpreter aborts with a fatal error. The "overflowed" flag is cleared when the recursion count drops below the primary recursion limit /minus/ 50. Now it looks rather smart but:
unfortunately, some functions inside the interpreter discard every exception by design. The primary example is PyDict_GetItem(), which is certainly used quite a lot :-)... When PyDict_GetItem() returns NULL, the caller can only assume that the key isn't in the dict, it has no way to know that there was a critical problem due to a recursion overflow.

I encountered the latter problem when trying to backport the py3k recursion overflow algorithm to trunk. A fatal error suddenly appeared in test_cpickle, and it turned out that the recursion count was exceeded in PyObject_RichCompare(), the error was then cleared in PyDict_GetItem(), but the "overflowed" flag was still set so that a subsequent recursion overflow would trigger the secondary check and lead to the fatal error.

I guess that, if it doesn't happen in py3k, it's just by chance: the recursion overflow is probably happening at another point where errors don't get discarded. Indeed, the failure I got on trunk was manifesting itself when running "regrtest.py test_cpickle" but not directly "test_cpickle.py"... which shows how delicate the recursion mechanism has become.

My attempt to solve the latter problem while still backporting the py3k scheme involves clearing the "overflowed" flag in PyErr_Clear(). This makes all tests pass ok, but also means the "overflowed" flag loses a lot of its meaning... since PyErr_Clear() is called in a lot of places (and, especially, in PyDict_GetItem()).

Also, at this point I fear that the solution to the problem is becoming, because of its complexity, perhaps worse than the problem itself. That's why I'm bringing it here, to have your opinion.

(I also suggest that we stop trying to fix recursion checking bugs until the stable release, so as to give us some time to do the Right Thing later - if there is such a thing)

Regards

Antoine.

Previous message: [Python-Dev] script containing all opcs
Next message: [Python-Dev] The recursion checking problem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list