(original) (raw)

Good point, I hadn't considered that it was regular common ref count 0 dealloc chaining. The processes unfortunately didn't have faulthandler enabled so there wasn't much info from where in the python code it happened (now fixed). I'll see if anything looks particularly unusual next time I hear of such a report.

-G

On Wed, Mar 27, 2019, 5:38 PM Tim Peters <tim.peters@gmail.com> wrote:
\[Gregory P. Smith <greg@krypto.org>\]
\> ...
\> A situation came up the other day where I believe this could've helped.
\>
\> Scenario (admittedly not one most environments run into): A Python process
\> with a C++ extension module implementing a threaded server (threads
\> spawned by C++) that could call back into CPython within server request
\> handlers. (ie: how all threaded servers tend to work regardless of core
\> loop implementation language)
\>
\> Python code in the application had done something (unknown to me, I
\> didn't dive into their code) that built up large enough presumably nested
\> or recursive data structures that the garbage collector, when triggered,
\> would wind up in very deep recursion. This caused a stack overflow
\> as the C++ spawned threads were only being given a 256k stack (to
\> conserve virtual address space - there can potentially be a \_ton\_ of
\> threads in this code).
\>
\> That had a C++ stack trace 1000+ levels deep repeating the pattern of
\>
\> ...
\> @ 0x564d59bd21de 32 func\_dealloc
\> @ 0x564d59bce0c1 32 cell\_dealloc
\> @ 0x564d5839db41 48 tupledealloc
\> @ 0x564d59bd21de 32 func\_dealloc
\> @ 0x564d59bce0c1 32 cell\_dealloc
\> @ 0x564d5839db41 48 tupledealloc
\> ...
\>
\> If our gc were done on a thread of its own spawned by Python, with a typical
\> normal larger default stack size (8mb) this would've been less likely
\> to trigger a crash (though obviously still possible if the recursion is linear).

Just noting that gc is probably a red herring in that scenario. gc
(cleverly!) determines the set of trash objects without needing any
recursive graph crawls. And gc doesn't deallocate anything - not
directly. It simply applies Py\_DECREF to trash objects, once for each
PyObject\* found pointing to them. \_If\_ deallocation occurs as a
result, it's the same as if the user had done \`del\` on the appropriate
object manually. The recursion then occurs in the chain of
XXX\_dealloc calls, which in turn do more Py\_DECREFs of their own, and
which have nothing in particular to do with gc. Approximately the
same stack depth would be hit in a Python built without gc if the user
did the same sequence of \`del\`s by hand to break trash cycles.

The good news is that the "trashcan" mechanism - which predates gc -
was introduced to limit call depth in the presence of some potentially
unbounded chains of dealloc calls. So teach trashcan about the
pattern here, and the stack depth problem goes away, with or without
gc.

The bad news is that the traschcan mechanism is excruciating, a
long-time source of subtle bugs of its own :-(

Note: without actual code to run, it's possible that trashcan is
\_already\_ trying to limit stack depth in this specific case, but the
stack size is too small to accommodate the largest depth of recursive
calls trashcan is willing to allow before interfering. Or some bug
snuck into trashcan, or its invoking code, that causes it to fail to
work as intended due to some unknown detail of the code above.
There's just no way to guess without code to provoke it.


\> Another take away from this is that it appears possible to cause our gc
\> to go into linear recursion.

As above, it's probably not gc, but deallocation as a side effect of
Py\_DECREF dropping a refcount to 0\. gc is just one way Py\_DECREF can
get invoked.