Issue 33930: Segfault with deep recursion into object().dir (original) (raw)

Created on 2018-06-21 11:27 by a-j-buxton, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (38)

msg320167 - (view)

Author: Alistair Buxton (a-j-buxton)

Date: 2018-06-21 11:27

The following small snippet of code will crash 3.6.5 with a segfault.

The crash occurs at cleanup, so it won't happen in the interactive interpreter (until you exit).

--- code ---

o = object() for x in range(1000000): o = o.dir print(x, id(o.dir))

--- end code ---

msg320263 - (view)

Author: Terry J. Reedy (terry.reedy) * (Python committer)

Date: 2018-06-22 20:28

Serhiy, is this one of the cases we decided to leave as won't fix?

msg320413 - (view)

Author: Ronald Oussoren (ronaldoussoren) * (Python committer)

Date: 2018-06-25 10:58

The crash is likely caused by recursion during the clean-up of the object().dir.dir.... chain.

The trashcan API (see Py_TRASHCAN_SAFE_BEGIN/Py_TRASHCAN_SAFE_END in Include/object.h) can help here.

Those do some real work and have a cost, I can't predict if that cost is substational enough to affect benchmarks (method objects are destroyed a lot).

msg320418 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2018-06-25 12:15

The trashcan mechanism is used for slot wrappers. The following code doesn't crash:

o = object() for x in range(1000000): o = o.hash

msg320458 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2018-06-26 00:13

Alistair, how did you happen upon this case?

msg320478 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2018-06-26 08:14

C traceback in gdb:

Program received signal SIGSEGV, Segmentation fault. 0x00000000004812a8 in meth_dealloc ( m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>) at Objects/methodobject.c:86 86 { (gdb) where #0 0x00000000004812a8 in meth_dealloc ( m=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>) at Objects/methodobject.c:86 #1 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973490>) at Objects/object.c:1931 #2 0x00000000004813c4 in meth_dealloc (m=0x7fffe3973580) at Objects/methodobject.c:91 #3 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973508>) at Objects/object.c:1931 #4 0x00000000004813c4 in meth_dealloc (m=0x7fffe39735f8) at Objects/methodobject.c:91 #5 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe3973580>) at Objects/object.c:1931 #6 0x00000000004813c4 in meth_dealloc (m=0x7fffe3973670) at Objects/methodobject.c:91 #7 0x0000000000488d9f in _Py_Dealloc ( op=<built-in method __dir__ of builtin_function_or_method object at remote 0x7fffe39735f8>) at Objects/object.c:1931 #8 0x00000000004813c4 in meth_dealloc (m=0x7fffe39736e8) at Objects/methodobject.c:91 (...)

msg320504 - (view)

Author: Alistair Buxton (a-j-buxton)

Date: 2018-06-26 16:55

I found this while chatting on IRC about the implementation details of objects - specifically, since objects have a dict of attributes, and dict subclasses object, it seems like every dict will have another dict inside it. (Eventually someone explained that base object() doesn't have any state at all, as a result dict doesn't have a dict of attributes, and there is no cycle.)

Anyway, as I was trying to explain my question to the other users, I had intended to post this snippet and ask "why doesn't this crash python?" Ironically it turned out that it does crash python.

msg320509 - (view)

Author: Ronald Oussoren (ronaldoussoren) * (Python committer)

Date: 2018-06-26 19:10

@victor: thanks for confirming my analysis.

The trashcan API does fix this issue, although I did have to change "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);" as well. I haven't checked yet why that's necessary.

msg329839 - (view)

Author: Augusto Goulart (augustogoulart) *

Date: 2018-11-13 13:13

ronaldoussoren, I have tested (Debian 9) substituting "_PyObject_GC_UNTRACK(m);" to "PyObject_GC_UnTrack(m);", and although the tests pass, I'm can still see the segfault error.

diff --git a/Objects/methodobject.c b/Objects/methodobject.c index cfea8cf410..cb8e279c7d 100644 --- a/Objects/methodobject.c +++ b/Objects/methodobject.c @@ -84,7 +84,7 @@ PyCFunction_GetFlags(PyObject *op) static void meth_dealloc(PyCFunctionObject *m) {

msg329846 - (view)

Author: Augusto Goulart (augustogoulart) *

Date: 2018-11-13 14:30

Sorry, I forgot to include the trashcan API. I'll double check and update the issue.

msg330516 - (view)

Author: Augusto Goulart (augustogoulart) *

Date: 2018-11-27 12:19

After a couple of rounds, this is the benchmark result I got for the attached patch. It seems to be a simple, non-performance degrading fix.

python -m perf compare_to origin.json patch.json -G --min-speed=2 Slower (1):

Faster (9):

Benchmark hidden because not significant (47): 2to3, chaos, crypto_pyaes, deltablue, dulwich_log, go, html5lib, logging_silent, logging_simple, mako, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pickle_pure_python, pidigits, python_startup, python_startup_no_site, regex_compile, regex_dna, regex_v8, richards, scimark_fft, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, spectral_norm, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

msg330520 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2018-11-27 13:59

Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?

The function is quite straighforward :-)

void PyObject_GC_Track(void *op) { _PyObject_GC_TRACK(op); }

msg330563 - (view)

Author: Ronald Oussoren (ronaldoussoren) * (Python committer)

Date: 2018-11-28 00:07

I don’t understand this either. For some reason the patch works when using the function and not when using the macro.

I haven’t looked into this yet, and am currently traveling. I should have time to look into this starting December 5th when I’m back home.

msg330579 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2018-11-28 08:27

It is hard to get a noticeable difference in macrobenchmarks. I want to see results of microbenchmarks.

For tuples, the trashcan mechanism is the main culprit of the performance hit when use short-living tuples for passing arguments. For bound method objects the relative effect should be smaller, because looking up the method is more costly than building a tuple, and less bound method objects are created after implementing the method call optimization. But in any case we should know the maximal cost of this change before merging it.

msg330635 - (view)

Author: Pablo Galindo Salgado (pablogsal) * (Python committer)

Date: 2018-11-28 22:53

Can someone please explain me why this change has any impact? Does it mean that _PyObject_GC_UNTRACK() macro has a bug?

Interestingly, this lonely and perplexing comment is left in PyObject_GC_UnTrack (the function):

void PyObject_GC_UnTrack(void op) { / Obscure: the Py_TRASHCAN mechanism requires that we be able to * call PyObject_GC_UnTrack twice on an object. */ if (_PyObject_GC_IS_TRACKED(op)) { _PyObject_GC_UNTRACK(op); } }

msg330776 - (view)

Author: Inada Naoki (methane) * (Python committer)

Date: 2018-11-30 11:57

microbench:

$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' python.master: ..................... 29.3 ns +- 0.6 ns python.patched: ..................... 30.9 ns +- 1.3 ns

Mean +- std dev: [python.master] 29.3 ns +- 0.6 ns -> [python.patched] 30.9 ns +- 1.3 ns: 1.05x slower (+5%)

And this is perf report of python.patched:

33.17% _PyEval_EvalFrameDefault 15.25% _PyObject_GenericGetAttrWithDict 9.63% PyCFunction_NewEx 8.25% _PyType_Lookup 6.42% meth_dealloc 3.50% _Py_bytes_contains 3.22% PyObject_GC_UnTrack 3.20% method_get 2.73% PyObject_GetAttr 2.55% repeat_next 0.88% _Py_Dealloc 0.81% PyObject_GenericGetAttr 0.78% sre_ucs1_match 0.50% lookdict_unicode_nodummy

msg330780 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2018-11-30 12:17

Thank you Inada. There is a large overhead for iterating. Try to add --duplicate=1000.

msg330785 - (view)

Author: Inada Naoki (methane) * (Python committer)

Date: 2018-11-30 12:54

$ ./python.patched -m perf timeit --compare-to ./python.master -s 'o="foo"' 'o.encode' --duplicate=10000 python.master: ..................... 23.1 ns +- 0.5 ns python.patched: ..................... 24.6 ns +- 0.5 ns

Mean +- std dev: [python.master] 23.1 ns +- 0.5 ns -> [python.patched] 24.6 ns +- 0.5 ns: 1.06x slower (+6%)

$ perf record ./python.patched -m perf timeit -s 'o="foo"' 'o.encode' --duplicate=10000 Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. ..................... Mean +- std dev: 24.9 ns +- 0.7 ns [ perf record: Woken up 9 times to write data ] [kernel.kallsyms] with build id debb7f6ce8aad7263cc6564650a88db5c8003389 not found, continuing without symbols [ perf record: Captured and wrote 2.491 MB perf.data (61787 samples) ]

$ perf report | cat (snip)

Total Lost Samples: 0

Samples: 61K of event 'cycles:ppp'

Event count (approx.): 52880893198

Overhead Command Shared Object Symbol

........ .............. ................................................ ..........................................

18.78%  python.patched  python.patched                                    [.] _PyEval_EvalFrameDefault
14.85%  python.patched  python.patched                                    [.] _PyObject_GenericGetAttrWithDict
 9.93%  python.patched  python.patched                                    [.] PyCFunction_NewEx
 8.56%  python.patched  python.patched                                    [.] _PyType_Lookup
 6.61%  python.patched  python.patched                                    [.] meth_dealloc
 4.12%  python.patched  python.patched                                    [.] PyParser_AddToken
 3.36%  python.patched  python.patched                                    [.] PyObject_GetAttr
 3.33%  python.patched  python.patched                                    [.] PyObject_GC_UnTrack
 3.25%  python.patched  python.patched                                    [.] method_get
 2.92%  python.patched  python.patched                                    [.] _Py_bytes_contains
 2.13%  python.patched  python.patched                                    [.] freechildren
 1.41%  python.patched  python.patched                                    [.] pymalloc_alloc.isra.6.part.7

msg399125 - (view)

Author: Irit Katriel (iritkatriel) * (Python committer)

Date: 2021-08-06 18:43

I've reproduced the segfault on 3.11 on a Mac.

msg399265 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2021-08-09 13:22

Correctness matters more than performance for 1 nanosecond. IMO a fix like attached bpo-33930.patch looks like the good approach to fix the crash.

msg399357 - (view)

Author: Neil Schemenauer (nascheme) * (Python committer)

Date: 2021-08-10 21:13

I'm thinking that the explicit call to PyObject_GC_UnTrack should be made unnecessary by integrating it into the trashcan code. That way, we avoid someone else running into this kind of bug in the future. See bpo-44881.

msg399366 - (view)

Author: Pablo Galindo Salgado (pablogsal) * (Python committer)

Date: 2021-08-10 23:34

New changeset bfc2d5a5c4550ab3a2fadeb9459b4bd948ff61a2 by Pablo Galindo Salgado in branch 'main': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) https://github.com/python/cpython/commit/bfc2d5a5c4550ab3a2fadeb9459b4bd948ff61a2

msg399373 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2021-08-11 08:32

New changeset 15f0a45b2822607f2c38685a72d06f050e24b215 by Miss Islington (bot) in branch '3.9': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27720) https://github.com/python/cpython/commit/15f0a45b2822607f2c38685a72d06f050e24b215

msg399374 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2021-08-11 08:32

New changeset d6d2d549972422ccc6fa335ebf9907e4b3d71817 by Miss Islington (bot) in branch '3.10': bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27719) https://github.com/python/cpython/commit/d6d2d549972422ccc6fa335ebf9907e4b3d71817

msg399375 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2021-08-11 08:33

Thanks for the fix Augusto Goulart and Pablo! I close the issue.

Pablo asked in PR 27678 if a Python 3.8 Release Manager (Lukasz) wants a backport to 3.8.

msg399402 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 15:35

New changeset 09d814b3833c2c852f4f233297b4561028782d2a by Łukasz Langa in branch '3.8': [3.8] bpo-33930: Fix segfault with deep recursion when cleaning method objects (GH-27678) (GH-27721) https://github.com/python/cpython/commit/09d814b3833c2c852f4f233297b4561028782d2a

msg399403 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 15:52

Apparently we've got ourselves refleaks with the patches.

It doesn't always reproduce but put test_exceptions -R: in a loop in 3.10 and soon enough you'll get:

0:00:00 load avg: 4.47 Run tests sequentially 0:00:00 load avg: 4.47 [1/1] test_exceptions beginning 9 repetitions 123456789 ......... test_exceptions leaked [2, 2, 2, 2] references, sum=8 test_exceptions leaked [2, 2, 2, 2] memory blocks, sum=8 test_exceptions failed (reference leak) in 59.6 sec

== Tests result: FAILURE ==

1 test failed: test_exceptions

It's much harder to reproduce on 3.9 and 3.8 because test_exceptions has fewer tests and at least on my MBP finishes twice as fast. I ran 3.9 in a loop for an hour and no refleak.

msg399404 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 15:57

Interestingly, I can't reproduce the refleaks on main either, only on 3.10.

msg399408 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 16:37

I gotta run for the day. The shortest way for me to reproduce is to focus on the tests that were added in 3.10 and shortening the repetitions a little:

$ time ./python.exe -E -Wd -m test test_exceptions -R3:3 -m 'test_attribute*' -m 'test_getattr*' -m 'test_lineno*' -m 'test_name_error*' -m 'test_recursion*' -m 'test_no_hang*'

0:00:00 load avg: 5.92 Run tests sequentially 0:00:00 load avg: 5.92 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [2, 2, 2] references, sum=6 test_exceptions leaked [2, 2, 2] memory blocks, sum=6 test_exceptions failed (reference leak)

== Tests result: FAILURE ==

1 test failed: test_exceptions

1 re-run test: test_exceptions

Total duration: 11.8 sec Tests result: FAILURE 12.13 real 11.44 user 0.61 sys

msg399409 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 16:44

(BTW, with this command, it is indeed possible to trigger the ref leak on main as well, which makes it somewhat less scary.)

msg399411 - (view)

Author: Irit Katriel (iritkatriel) * (Python committer)

Date: 2021-08-11 18:48

It seems to have been introduced by https://github.com/python/cpython/pull/27626

I don't see how the code change could have introduced a leak. Maybe the new tests exposed a pre-existing leak?

msg399421 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-11 22:35

Entirely possible, Irit. We've had just this sort of case just last week! See: https://bugs.python.org/issue44524#msg399157

msg399425 - (view)

Author: Irit Katriel (iritkatriel) * (Python committer)

Date: 2021-08-12 00:14

I've narrowed it down a bit more and created .

msg399429 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2021-08-12 01:56

New changeset f08e6d1bb3c5655f184af88c6793e90908bb6338 by Benjamin Peterson in branch 'main': bpo-33930: Fix typo in the test name. (#27733) https://github.com/python/cpython/commit/f08e6d1bb3c5655f184af88c6793e90908bb6338

msg399430 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2021-08-12 02:17

New changeset d7f5796a1ec7ba223f6a844d7580559abef05238 by Miss Islington (bot) in branch '3.8': bpo-33930: Fix typo in the test name. (GH-27735) https://github.com/python/cpython/commit/d7f5796a1ec7ba223f6a844d7580559abef05238

msg399431 - (view)

Author: miss-islington (miss-islington)

Date: 2021-08-12 02:20

New changeset f7635f0e542c916bfd62542e7b60ee23ff681d0d by Miss Islington (bot) in branch '3.9': [3.9] bpo-33930: Fix typo in the test name. (GH-27733) (GH-27734) https://github.com/python/cpython/commit/f7635f0e542c916bfd62542e7b60ee23ff681d0d

msg399511 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2021-08-13 09:44

Since the refleak investigation moved to its own issue, I'm re-closing this one for clarity.

msg399512 - (view)

Author: miss-islington (miss-islington)

Date: 2021-08-13 09:45

New changeset ef36dfe4de1ee0df11cde94fd76465336aa8141d by Benjamin Peterson in branch '3.10': [3.10] bpo-33930: Fix typo in the test name. (GH-27736) https://github.com/python/cpython/commit/ef36dfe4de1ee0df11cde94fd76465336aa8141d

History

Date

User

Action

Args

2022-04-11 14:59:02

admin

set

github: 78111

2021-08-13 09:45:17

miss-islington

set

messages: +

2021-08-13 09:44:37

lukasz.langa

set

status: open -> closed
resolution: fixed
messages: +

stage: patch review -> resolved

2021-08-12 02:20:29

miss-islington

set

messages: +

2021-08-12 02:17:31

benjamin.peterson

set

messages: +

2021-08-12 02:02:12

benjamin.peterson

set

pull_requests: + <pull%5Frequest26216>

2021-08-12 01:57:01

miss-islington

set

pull_requests: + <pull%5Frequest26215>

2021-08-12 01:56:56

miss-islington

set

pull_requests: + <pull%5Frequest26214>

2021-08-12 01:56:50

benjamin.peterson

set

messages: +

2021-08-12 01:33:31

benjamin.peterson

set

nosy: + benjamin.peterson
pull_requests: + <pull%5Frequest26213>

2021-08-12 00:14:32

iritkatriel

set

messages: +

2021-08-11 22:35:26

lukasz.langa

set

messages: +

2021-08-11 18:48:28

iritkatriel

set

messages: +

2021-08-11 16:44:31

lukasz.langa

set

messages: +

2021-08-11 16:37:50

lukasz.langa

set

messages: +

2021-08-11 15:57:33

lukasz.langa

set

messages: +

2021-08-11 15:52:03

lukasz.langa

set

status: closed -> open
versions: + Python 3.9, Python 3.10
messages: +

resolution: fixed -> (no value)
stage: resolved -> patch review

2021-08-11 15:35:46

lukasz.langa

set

messages: +

2021-08-11 09:04:53

lukasz.langa

set

nosy: + lukasz.langa

pull_requests: + <pull%5Frequest26204>

2021-08-11 08:33:31

vstinner

set

status: open -> closed
resolution: fixed
messages: +

stage: patch review -> resolved

2021-08-11 08:32:48

vstinner

set

messages: +

2021-08-11 08:32:28

vstinner

set

messages: +

2021-08-10 23:34:56

miss-islington

set

pull_requests: + <pull%5Frequest26203>

2021-08-10 23:34:40

pablogsal

set

messages: +

2021-08-10 23:34:39

miss-islington

set

nosy: + miss-islington
pull_requests: + <pull%5Frequest26202>

2021-08-10 21:13:16

nascheme

set

nosy: + nascheme
messages: +

2021-08-09 13:30:33

pablogsal

set

stage: patch review
pull_requests: + <pull%5Frequest26165>

2021-08-09 13:22:02

vstinner

set

messages: +

2021-08-06 18:43:07

iritkatriel

set

nosy: + iritkatriel

messages: +
versions: + Python 3.11, - Python 3.6

2018-11-30 12:54:04

methane

set

messages: +

2018-11-30 12:17:21

serhiy.storchaka

set

messages: +

2018-11-30 11:57:25

methane

set

nosy: + methane
messages: +

2018-11-28 22:53:22

pablogsal

set

messages: +

2018-11-28 08:27:46

serhiy.storchaka

set

messages: +

2018-11-28 00:07:59

ronaldoussoren

set

messages: +

2018-11-27 13:59:26

vstinner

set

messages: +

2018-11-27 12:19:41

augustogoulart

set

files: + bpo-33930.patch
keywords: + patch
messages: +

2018-11-13 14:30:38

augustogoulart

set

messages: +

2018-11-13 13:14:57

augustogoulart

set

nosy: + taleinat

2018-11-13 13:13:34

augustogoulart

set

nosy: + augustogoulart
messages: +

2018-08-14 22:02:17

pablogsal

set

nosy: + pablogsal

2018-06-26 19:10:54

ronaldoussoren

set

messages: +

2018-06-26 16:55:00

a-j-buxton

set

messages: +

2018-06-26 08:14:23

vstinner

set

messages: +

2018-06-26 00:13:31

rhettinger

set

nosy: + rhettinger
messages: +

2018-06-25 12:58:00

vstinner

set

nosy: + vstinner

2018-06-25 12:15:00

serhiy.storchaka

set

messages: +

2018-06-25 10:58:56

ronaldoussoren

set

nosy: + ronaldoussoren
messages: +

2018-06-22 20:28:45

terry.reedy

set

nosy: + serhiy.storchaka, terry.reedy
messages: +

2018-06-21 11:27:57

a-j-buxton

create