Issue 33930: Segfault with deep recursion into object().dir (original) (raw)
Created on 2018-06-21 11:27 by a-j-buxton, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (38)
Author: Alistair Buxton (a-j-buxton)
Date: 2018-06-21 11:27
The following small snippet of code will crash 3.6.5 with a segfault.
The crash occurs at cleanup, so it won't happen in the interactive interpreter (until you exit).
--- code ---
o = object() for x in range(1000000): o = o.dir print(x, id(o.dir))
--- end code ---
Author: Terry J. Reedy (terry.reedy) *
Date: 2018-06-22 20:28
Serhiy, is this one of the cases we decided to leave as won't fix?
Author: Ronald Oussoren (ronaldoussoren) *
Date: 2018-06-25 10:58
The crash is likely caused by recursion during the clean-up of the object().dir.dir.... chain.
The trashcan API (see Py_TRASHCAN_SAFE_BEGIN/Py_TRASHCAN_SAFE_END in Include/object.h) can help here.
Those do some real work and have a cost, I can't predict if that cost is substational enough to affect benchmarks (method objects are destroyed a lot).
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2018-06-25 12:15
The trashcan mechanism is used for slot wrappers. The following code doesn't crash:
o = object() for x in range(1000000): o = o.hash
Author: Raymond Hettinger (rhettinger) *
Date: 2018-06-26 00:13
Alistair, how did you happen upon this case?
Author: STINNER Victor (vstinner) *
Date: 2018-06-26 08:14
C traceback in gdb:
Program received signal SIGSEGV, Segmentation fault. 0x00000000004812a8 in meth_dealloc ( m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>) at Objects/methodobject.c:86 86 { (gdb) where #0 0x00000000004812a8 in meth_dealloc ( m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>) at Objects/methodobject.c:86 #1 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973490>) at Objects/object.c:1931 #2 0x00000000004813c4 in meth_dealloc (m=0x7fffe3973580) at Objects/methodobject.c:91 #3 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973508>) at Objects/object.c:1931 #4 0x00000000004813c4 in meth_dealloc (m=0x7fffe39735f8) at Objects/methodobject.c:91 #5 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973580>) at Objects/object.c:1931 #6 0x00000000004813c4 in meth_dealloc (m=0x7fffe3973670) at Objects/methodobject.c:91 #7 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe39735f8>) at Objects/object.c:1931 #8 0x00000000004813c4 in meth_dealloc (m=0x7fffe39736e8) at Objects/methodobject.c:91 (...)
Author: Alistair Buxton (a-j-buxton)
Date: 2018-06-26 16:55
I found this while chatting on IRC about the implementation details of objects - specifically, since objects have a dict of attributes, and dict subclasses object, it seems like every dict will have another dict inside it. (Eventually someone explained that base object() doesn't have any state at all, as a result dict doesn't have a dict of attributes, and there is no cycle.)
Anyway, as I was trying to explain my question to the other users, I had intended to post this snippet and ask "why doesn't this crash python?" Ironically it turned out that it does crash python.
Author: Ronald Oussoren (ronaldoussoren) *
Date: 2018-06-26 19:10
@victor: thanks for confirming my analysis.
The trashcan API does fix this issue, although I did have to change "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);" as well. I haven't checked yet why that's necessary.
Author: Augusto Goulart (augustogoulart) *
Date: 2018-11-13 13:13
ronaldoussoren, I have tested (Debian 9) substituting "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);", and although the tests pass, I'm can still see the segfault error.
diff --git a/Objects/methodobject.c b/Objects/methodobject.c index cfea8cf410..cb8e279c7d 100644 --- a/Objects/methodobject.c +++ b/Objects/methodobject.c @@ -84,7 +84,7 @@ PyCFunction_GetFlags(PyObject *op) static void meth_dealloc(PyCFunctionObject *m) {
- _PyObject_GC_UNTRACK(m);
- PyObject_GC_UnTrack(m); if (m->m_weakreflist != NULL) { PyObject_ClearWeakRefs((PyObject*) m); }
Author: Augusto Goulart (augustogoulart) *
Date: 2018-11-13 14:30
Sorry, I forgot to include the trashcan API. I'll double check and update the issue.
Author: Augusto Goulart (augustogoulart) *
Date: 2018-11-27 12:19
After a couple of rounds, this is the benchmark result I got for the attached patch. It seems to be a simple, non-performance degrading fix.
python -m perf compare_to origin.json patch.json -G --min-speed=2 Slower (1):
- sqlalchemy_declarative: 213 ms +- 4 ms -> 219 ms +- 9 ms: 1.03x slower (+3%)
Faster (9):
- logging_format: 15.1 us +- 0.3 us -> 14.5 us +- 0.3 us: 1.05x faster (-5%)
- hexiom: 15.0 ms +- 0.5 ms -> 14.5 ms +- 0.2 ms: 1.03x faster (-3%)
- json_loads: 39.6 us +- 0.7 us -> 38.5 us +- 0.5 us: 1.03x faster (-3%)
- json_dumps: 17.2 ms +- 0.4 ms -> 16.7 ms +- 0.2 ms: 1.03x faster (-3%)
- fannkuch: 634 ms +- 4 ms -> 619 ms +- 6 ms: 1.02x faster (-2%)
- django_template: 183 ms +- 3 ms -> 178 ms +- 5 ms: 1.02x faster (-2%)
- float: 153 ms +- 2 ms -> 150 ms +- 1 ms: 1.02x faster (-2%)
- raytrace: 761 ms +- 19 ms -> 745 ms +- 8 ms: 1.02x faster (-2%)
- regex_effbot: 3.69 ms +- 0.06 ms -> 3.61 ms +- 0.05 ms: 1.02x faster (-2%)
Benchmark hidden because not significant (47): 2to3, chaos, crypto_pyaes, deltablue, dulwich_log, go, html5lib, logging_silent, logging_simple, mako, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pickle_pure_python, pidigits, python_startup, python_startup_no_site, regex_compile, regex_dna, regex_v8, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process
Author: STINNER Victor (vstinner) *
Date: 2018-11-27 13:59
- _PyObject_GC_UNTRACK(m);
- PyObject_GC_UnTrack(m);
Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?
The function is quite straighforward :-)
void PyObject_GC_Track(void *op) { _PyObject_GC_TRACK(op); }
Author: Ronald Oussoren (ronaldoussoren) *
Date: 2018-11-28 00:07
I don’t understand this either. For some reason the patch works when using the function and not when using the macro.
I haven’t looked into this yet, and am currently traveling. I should have time to look into this starting December 5th when I’m back home.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2018-11-28 08:27
It is hard to get a noticeable difference in macrobenchmarks. I want to see results of microbenchmarks.
For tuples, the trashcan mechanism is the main culprit of the performance hit when use short-living tuples for passing arguments. For bound method objects the relative effect should be smaller, because looking up the method is more costly than building a tuple, and less bound method objects are created after implementing the method call optimization. But in any case we should know the maximal cost of this change before merging it.
Author: Pablo Galindo Salgado (pablogsal) *
Date: 2018-11-28 22:53
Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?
Interestingly, this lonely and perplexing comment is left in PyObject_GC_UnTrack (the function):
void PyObject_GC_UnTrack(void op) { / Obscure: the Py_TRASHCAN mechanism requires that we be able to * call PyObject_GC_UnTrack twice on an object. */ if (_PyObject_GC_IS_TRACKED(op)) { _PyObject_GC_UNTRACK(op); } }
Author: Inada Naoki (methane) *
Date: 2018-11-30 11:57
microbench:
$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' python.master: ..................... 29.3 ns +- 0.6 ns python.patched: ..................... 30.9 ns +- 1.3 ns
Mean +- std dev: [python.master] 29.3 ns +- 0.6 ns -> [python.patched] 30.9 ns +- 1.3 ns: 1.05x slower (+5%)
And this is perf report of python.patched:
33.17% _PyEval_EvalFrameDefault 15.25% _PyObject_GenericGetAttrWithDict 9.63% PyCFunction_NewEx 8.25% _PyType_Lookup 6.42% meth_dealloc 3.50% _Py_bytes_contains 3.22% PyObject_GC_UnTrack 3.20% method_get 2.73% PyObject_GetAttr 2.55% repeat_next 0.88% _Py_Dealloc 0.81% PyObject_GenericGetAttr 0.78% sre_ucs1_match 0.50% lookdict_unicode_nodummy
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2018-11-30 12:17
Thank you Inada. There is a large overhead for iterating. Try to add --duplicate=1000.
Author: Inada Naoki (methane) *
Date: 2018-11-30 12:54
$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' --duplicate=10000 python.master: ..................... 23.1 ns +- 0.5 ns python.patched: ..................... 24.6 ns +- 0.5 ns
Mean +- std dev: [python.master] 23.1 ns +- 0.5 ns -> [python.patched] 24.6 ns +- 0.5 ns: 1.06x slower (+6%)
$ perf record ./python.patched -m perf timeit -s 'o="foo"' 'o.encode' --duplicate=10000 Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. ..................... Mean +- std dev: 24.9 ns +- 0.7 ns [ perf record: Woken up 9 times to write data ] [kernel.kallsyms] with build id debb7f6ce8aad7263cc6564650a88db5c8003389 not found, continuing without symbols [ perf record: Captured and wrote 2.491 MB perf.data (61787 samples) ]
$ perf report | cat (snip)
Total Lost Samples: 0
Samples: 61K of event 'cycles:ppp'
Event count (approx.): 52880893198
Overhead Command Shared Object Symbol
........ .............. ................................................ ..........................................
18.78% python.patched python.patched [.] _PyEval_EvalFrameDefault
14.85% python.patched python.patched [.] _PyObject_GenericGetAttrWithDict
9.93% python.patched python.patched [.] PyCFunction_NewEx
8.56% python.patched python.patched [.] _PyType_Lookup
6.61% python.patched python.patched [.] meth_dealloc
4.12% python.patched python.patched [.] PyParser_AddToken
3.36% python.patched python.patched [.] PyObject_GetAttr
3.33% python.patched python.patched [.] PyObject_GC_UnTrack
3.25% python.patched python.patched [.] method_get
2.92% python.patched python.patched [.] _Py_bytes_contains
2.13% python.patched python.patched [.] freechildren
1.41% python.patched python.patched [.] pymalloc_alloc.isra.6.part.7
Author: Irit Katriel (iritkatriel) *
Date: 2021-08-06 18:43
I've reproduced the segfault on 3.11 on a Mac.
Author: STINNER Victor (vstinner) *
Date: 2021-08-09 13:22
Correctness matters more than performance for 1 nanosecond. IMO a fix like attached bpo-33930.patch looks like the good approach to fix the crash.
Author: Neil Schemenauer (nascheme) *
Date: 2021-08-10 21:13
I'm thinking that the explicit call to PyObject_GC_UnTrack should be made unnecessary by integrating it into the trashcan code. That way, we avoid someone else running into this kind of bug in the future. See bpo-44881.
Author: Pablo Galindo Salgado (pablogsal) *
Date: 2021-08-10 23:34
New changeset bfc2d5a5c4550ab3a2fadeb9459b4bd948ff61a2 by Pablo Galindo Salgado in branch 'main': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) https://github.com/python/cpython/commit/bfc2d5a5c4550ab3a2fadeb9459b4bd948ff61a2
Author: STINNER Victor (vstinner) *
Date: 2021-08-11 08:32
New changeset 15f0a45b2822607f2c38685a72d06f050e24b215 by Miss Islington (bot) in branch '3.9': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27720) https://github.com/python/cpython/commit/15f0a45b2822607f2c38685a72d06f050e24b215
Author: STINNER Victor (vstinner) *
Date: 2021-08-11 08:32
New changeset d6d2d549972422ccc6fa335ebf9907e4b3d71817 by Miss Islington (bot) in branch '3.10': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27719) https://github.com/python/cpython/commit/d6d2d549972422ccc6fa335ebf9907e4b3d71817
Author: STINNER Victor (vstinner) *
Date: 2021-08-11 08:33
Thanks for the fix Augusto Goulart and Pablo! I close the issue.
Pablo asked in PR 27678 if a Python 3.8 Release Manager (Lukasz) wants a backport to 3.8.
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 15:35
New changeset 09d814b3833c2c852f4f233297b4561028782d2a by Łukasz Langa in branch '3.8': [3.8] bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27721) https://github.com/python/cpython/commit/09d814b3833c2c852f4f233297b4561028782d2a
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 15:52
Apparently we've got ourselves refleaks with the patches.
It doesn't always reproduce but put test_exceptions -R: in a loop in 3.10 and soon enough you'll get:
0:00:00 load avg: 4.47 Run tests sequentially 0:00:00 load avg: 4.47 [1/1] test_exceptions beginning 9 repetitions 123456789 ......... test_exceptions leaked [2, 2, 2, 2] references, sum=8 test_exceptions leaked [2, 2, 2, 2] memory blocks, sum=8 test_exceptions failed (reference leak) in 59.6 sec
== Tests result: FAILURE ==
1 test failed: test_exceptions
It's much harder to reproduce on 3.9 and 3.8 because test_exceptions has fewer tests and at least on my MBP finishes twice as fast. I ran 3.9 in a loop for an hour and no refleak.
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 15:57
Interestingly, I can't reproduce the refleaks on main
either, only on 3.10.
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 16:37
I gotta run for the day. The shortest way for me to reproduce is to focus on the tests that were added in 3.10 and shortening the repetitions a little:
$ time ./python.exe -E -Wd -m test test_exceptions -R3:3 -m 'test_attribute*' -m 'test_getattr*' -m 'test_lineno*' -m 'test_name_error*' -m 'test_recursion*' -m 'test_no_hang*'
0:00:00 load avg: 5.92 Run tests sequentially 0:00:00 load avg: 5.92 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [2, 2, 2] references, sum=6 test_exceptions leaked [2, 2, 2] memory blocks, sum=6 test_exceptions failed (reference leak)
== Tests result: FAILURE ==
1 test failed: test_exceptions
1 re-run test: test_exceptions
Total duration: 11.8 sec Tests result: FAILURE 12.13 real 11.44 user 0.61 sys
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 16:44
(BTW, with this command, it is indeed possible to trigger the ref leak on main
as well, which makes it somewhat less scary.)
Author: Irit Katriel (iritkatriel) *
Date: 2021-08-11 18:48
It seems to have been introduced by https://github.com/python/cpython/pull/27626
I don't see how the code change could have introduced a leak. Maybe the new tests exposed a pre-existing leak?
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-11 22:35
Entirely possible, Irit. We've had just this sort of case just last week! See: https://bugs.python.org/issue44524#msg399157
Author: Irit Katriel (iritkatriel) *
Date: 2021-08-12 00:14
I've narrowed it down a bit more and created .
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2021-08-12 01:56
New changeset f08e6d1bb3c5655f184af88c6793e90908bb6338 by Benjamin Peterson in branch 'main': bpo-33930: Fix typo in the test name. (#27733) https://github.com/python/cpython/commit/f08e6d1bb3c5655f184af88c6793e90908bb6338
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2021-08-12 02:17
New changeset d7f5796a1ec7ba223f6a844d7580559abef05238 by Miss Islington (bot) in branch '3.8': bpo-33930: Fix typo in the test name. (GH-27735) https://github.com/python/cpython/commit/d7f5796a1ec7ba223f6a844d7580559abef05238
Author: miss-islington (miss-islington)
Date: 2021-08-12 02:20
New changeset f7635f0e542c916bfd62542e7b60ee23ff681d0d by Miss Islington (bot) in branch '3.9': [3.9] bpo-33930: Fix typo in the test name. (GH-27733) (GH-27734) https://github.com/python/cpython/commit/f7635f0e542c916bfd62542e7b60ee23ff681d0d
Author: Łukasz Langa (lukasz.langa) *
Date: 2021-08-13 09:44
Since the refleak investigation moved to its own issue, I'm re-closing this one for clarity.
Author: miss-islington (miss-islington)
Date: 2021-08-13 09:45
New changeset ef36dfe4de1ee0df11cde94fd76465336aa8141d by Benjamin Peterson in branch '3.10': [3.10] bpo-33930: Fix typo in the test name. (GH-27736) https://github.com/python/cpython/commit/ef36dfe4de1ee0df11cde94fd76465336aa8141d
History
Date
User
Action
Args
2022-04-11 14:59:02
admin
set
github: 78111
2021-08-13 09:45:17
miss-islington
set
messages: +
2021-08-13 09:44:37
lukasz.langa
set
status: open -> closed
resolution: fixed
messages: +
stage: patch review -> resolved
2021-08-12 02:20:29
miss-islington
set
messages: +
2021-08-12 02:17:31
benjamin.peterson
set
messages: +
2021-08-12 02:02:12
benjamin.peterson
set
pull_requests: + <pull%5Frequest26216>
2021-08-12 01:57:01
miss-islington
set
pull_requests: + <pull%5Frequest26215>
2021-08-12 01:56:56
miss-islington
set
pull_requests: + <pull%5Frequest26214>
2021-08-12 01:56:50
benjamin.peterson
set
messages: +
2021-08-12 01:33:31
benjamin.peterson
set
nosy: + benjamin.peterson
pull_requests: + <pull%5Frequest26213>
2021-08-12 00:14:32
iritkatriel
set
messages: +
2021-08-11 22:35:26
lukasz.langa
set
messages: +
2021-08-11 18:48:28
iritkatriel
set
messages: +
2021-08-11 16:44:31
lukasz.langa
set
messages: +
2021-08-11 16:37:50
lukasz.langa
set
messages: +
2021-08-11 15:57:33
lukasz.langa
set
messages: +
2021-08-11 15:52:03
lukasz.langa
set
status: closed -> open
versions: + Python 3.9, Python 3.10
messages: +
resolution: fixed -> (no value)
stage: resolved -> patch review
2021-08-11 15:35:46
lukasz.langa
set
messages: +
2021-08-11 09:04:53
lukasz.langa
set
nosy: + lukasz.langa
pull_requests: + <pull%5Frequest26204>
2021-08-11 08:33:31
vstinner
set
status: open -> closed
resolution: fixed
messages: +
stage: patch review -> resolved
2021-08-11 08:32:48
vstinner
set
messages: +
2021-08-11 08:32:28
vstinner
set
messages: +
2021-08-10 23:34:56
miss-islington
set
pull_requests: + <pull%5Frequest26203>
2021-08-10 23:34:40
pablogsal
set
messages: +
2021-08-10 23:34:39
miss-islington
set
nosy: + miss-islington
pull_requests: + <pull%5Frequest26202>
2021-08-10 21:13:16
nascheme
set
nosy: + nascheme
messages: +
2021-08-09 13:30:33
pablogsal
set
stage: patch review
pull_requests: + <pull%5Frequest26165>
2021-08-09 13:22:02
vstinner
set
messages: +
2021-08-06 18:43:07
iritkatriel
set
nosy: + iritkatriel
messages: +
versions: + Python 3.11, - Python 3.6
2018-11-30 12:54:04
methane
set
messages: +
2018-11-30 12:17:21
serhiy.storchaka
set
messages: +
2018-11-30 11:57:25
methane
set
nosy: + methane
messages: +
2018-11-28 22:53:22
pablogsal
set
messages: +
2018-11-28 08:27:46
serhiy.storchaka
set
messages: +
2018-11-28 00:07:59
ronaldoussoren
set
messages: +
2018-11-27 13:59:26
vstinner
set
messages: +
2018-11-27 12:19:41
augustogoulart
set
files: + bpo-33930.patch
keywords: + patch
messages: +
2018-11-13 14:30:38
augustogoulart
set
messages: +
2018-11-13 13:14:57
augustogoulart
set
nosy: + taleinat
2018-11-13 13:13:34
augustogoulart
set
nosy: + augustogoulart
messages: +
2018-08-14 22:02:17
pablogsal
set
nosy: + pablogsal
2018-06-26 19:10:54
ronaldoussoren
set
messages: +
2018-06-26 16:55:00
a-j-buxton
set
messages: +
2018-06-26 08:14:23
vstinner
set
messages: +
2018-06-26 00:13:31
rhettinger
set
nosy: + rhettinger
messages: +
2018-06-25 12:58:00
vstinner
set
nosy: + vstinner
2018-06-25 12:15:00
serhiy.storchaka
set
messages: +
2018-06-25 10:58:56
ronaldoussoren
set
nosy: + ronaldoussoren
messages: +
2018-06-22 20:28:45
terry.reedy
set
nosy: + serhiy.storchaka, terry.reedy
messages: +
2018-06-21 11:27:57
a-j-buxton
create