Issue 30260: sock_dealloc() may call repr when socket class is already collected by GC (original) (raw)

I faced a crash during iterpreter shutdown (Py_Finalize), in final garbage collection phase.

The reason, as far as I could understand, was the following:

In Python 3, when an unclosed socket is destroyed, it issues a ResourceWarning, which contains a string representation of socket object (obtained by repr).

If socket class object and unclosed socket object (instance) are in a reference cycle, GC may collect socket class object before it collects an instance when dealing with that cycle. Then, when instance is collected, sock_dealloc() will be called, which will call PyErr_WarnFormat with %R format specifier, which triggers a call to repr:

static void sock_dealloc(PySocketSockObject *s) { if (s->sock_fd != INVALID_SOCKET) { // ... if (PyErr_WarnFormat(PyExc_ResourceWarning, 1, "unclosed %R", s)) //... }

At this moment, socket class object was already deallocated, so was repr, resides in it. Particularly, in my tests, when repr attempted to access a global variable (getattr), it tried to take it's name from co_names tuple in it's code object. This tuple was deallocated, and co_names pointed to tuple free list. Trying to use a tuple from free list as a dict key resulted in a crash (more detailed stack trace will be provided in attachment):

gdb python3.5 test.py

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1 ... Starting program: /usr/local/bin/python3.5 test.py [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". test

Program received signal SIGSEGV, Segmentation fault. 0x000000000048998c in PyObject_Hash (v=0x0) at Objects/object.c:761 761 PyTypeObject *tp = Py_TYPE(v); (gdb) bt #0 0x000000000048998c in PyObject_Hash (v=0x0) at Objects/object.c:761 #1 0x0000000000498d96 in tuplehash (v=0x7ffff7f39948) at Objects/tupleobject.c:351 #2 0x00000000004899b2 in PyObject_Hash (v=0x7ffff7f39948) at Objects/object.c:763 ... tuplehash and PyObject_Hash here also ... #34 0x00000000004899b2 in PyObject_Hash (v=0x7ffff6801708) at Objects/object.c:763 #35 0x0000000000498d96 in tuplehash (v=0x7ffff6812ca8) at Objects/tupleobject.c:351 #36 0x00000000004899b2 in PyObject_Hash (v=0x7ffff6812ca8) at Objects/object.c:763 #37 0x000000000047a056 in PyDict_GetItemWithError (op=0x7ffff7e83608, key=0x7ffff6812ca8) at Objects/dictobject.c:1143 #38 0x000000000047a1e8 in _PyDict_LoadGlobal (globals=0x7ffff7e83608, builtins=0x7ffff6ea8a88, key=0x7ffff6812ca8) at Objects/dictobject.c:1192 #39 0x0000000000509117 in PyEval_EvalFrameEx (f=0xa196d8, throwflag=0) at Python/ceval.c:2408 #40 0x0000000000511d0b in _PyEval_EvalCodeWithName (_co=0x7ffff7e85270, globals=0x7ffff7e83608, locals=0x0, args=0x7ffff7f07c68, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4071 #41 0x0000000000511e00 in PyEval_EvalCodeEx (_co=0x7ffff7e85270, globals=0x7ffff7e83608, locals=0x0, args=0x7ffff7f07c68, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4092 #42 0x00000000005db41c in function_call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/funcobject.c:627 #43 0x000000000043ae8f in PyObject_Call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/abstract.c:2166 #44 0x00000000005cb13c in method_call (func=0x7ffff683ec80, arg=0x7ffff7f07c50, kw=0x0) at Objects/classobject.c:330 #45 0x000000000043ae8f in PyObject_Call (func=0x7ffff7eaca48, arg=0x7ffff7fa8048, kw=0x0) at Objects/abstract.c:2166 #46 0x00000000005138a9 in PyEval_CallObjectWithKeywords (func=0x7ffff7eaca48, arg=0x7ffff7fa8048, kw=0x0) at Python/ceval.c:4633 #47 0x00000000004abf3e in slot_tp_repr (self=0x7ffff684e8d0) at Objects/typeobject.c:5992 #48 0x0000000000488ef9 in PyObject_Repr (v=0x7ffff684e8d0) at Objects/object.c:482 #49 0x00000000004c03fb in unicode_fromformat_arg (writer=0x7fffffffdeb0, f=0x7ffff6ed9850 "R", vargs=0x7fffffffdef0) at Objects/unicodeobject.c:2645 #50 0x00000000004c0754 in PyUnicode_FromFormatV (format=0x7ffff6ed9846 "unclosed %R", vargs=0x7fffffffdf60) at Objects/unicodeobject.c:2710 #51 0x00000000004f90e7 in PyErr_WarnFormat (category=0x8a1a20 <_PyExc_ResourceWarning>, stack_level=1, format=0x7ffff6ed9846 "unclosed %R") at Python/_warnings.c:895 #52 0x00007ffff6ed438a in sock_dealloc (s=0x7ffff684e8d0) at /root/test/test35/cpython/Modules/socketmodule.c:4177 #53 0x000000000049d02c in subtype_dealloc (self=0x7ffff684e8d0) at Objects/typeobject.c:1209 #54 0x00000000004781c5 in free_keys_object (keys=0xa3fd90) at Objects/dictobject.c:351 #55 0x000000000047a908 in PyDict_Clear (op=0x7ffff7f55448) at Objects/dictobject.c:1388 #56 0x000000000047df02 in dict_tp_clear (op=0x7ffff7f55448) at Objects/dictobject.c:2595 #57 0x00000000005679de in delete_garbage (collectable=0x7fffffffe2f0, old=0x8ce440 <generations+64>) at Modules/gcmodule.c:866 #58 0x0000000000567e30 in collect (generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1) at Modules/gcmodule.c:1014 #59 0x0000000000568e84 in _PyGC_CollectNoFail () at Modules/gcmodule.c:1605 #60 0x0000000000532023 in PyImport_Cleanup () at Python/import.c:481 #61 0x0000000000542083 in Py_Finalize () at Python/pylifecycle.c:576 #62 0x0000000000420fdf in Py_Main (argc=2, argv=0x92d010) at Modules/main.c:798 #63 0x000000000041bc4f in main (argc=2, argv=0x7fffffffe748) at ./Programs/python.c:65 (gdb)

So this is something like a use-after-free.

I think the root cause is that finalizer that can call some python code is called at deallocation phase, interleaving with object's deletion. As well as with objects with del method, GC can't know the order in which objects should be deleted if they are in a reference cycle.

This problem does not exist in 3.6 because there is a separate finalizing method sock_finalize there where a ResourceWarning is issued, and such finalizers are called for the whole reference cycle before any object in it is deallocated (at least this is how I understood, correct me if I'm wrong). So, this change with finalizers probably should be backported to previous versions of Python 3.

I also could not reproduce this in 3.3, but I didn't have time to research that closely yet.