Issue 17703: Trashcan mechanism segfault during interpreter finalization in Python 2.7.4 (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: Stian Lode, amaury.forgeotdarc, benjamin.peterson, flex.plexico, jdemeyer, larry, lemburg, m_python, ned.deily, pitrou, python-dev, trevorj, vstinner
Priority: release blocker Keywords: patch

Created on 2013-04-12 13:43 by lemburg, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tstate_trashcan.patch pitrou,2013-04-12 18:00
bt.txt Stian Lode,2015-08-12 11:38 Stacktrace of segfault.
Messages (25)
msg186627 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:43
A user reported a segfault when using our mxURL extension with Python 2.7.4. Previous Python versions do not show this behavior, so it's new in Python 2.7.4. The extension uses an Py_AtExit() function to clean up among other things a reference to a dictionary from another module. The dictionary deallocation then causes the segfault: $ gdb /usr/local/bin/python2.7 GNU gdb (GDB) 7.0.1-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/local/bin/python2.7...done. (gdb) r Starting program: /usr/local/bin/python2.7 [Thread debugging using libthread_db enabled] Python 2.7.4 (default, Apr 8 2013, 15:51:19) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import mx.URL >>> import sys >>> sys.exit() Program received signal SIGSEGV, Segmentation fault. 0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005 1005 Py_TRASHCAN_SAFE_BEGIN(mp) (gdb) bt #0 0x08091201 in dict_dealloc (mp=0xb7b2813c) at Objects/dictobject.c:1005 #1 0xb7875928 in mxURLModule_Cleanup () at mx/URL/mxURL/mxURL.c:2789 #2 0x0810553f in call_ll_exitfuncs () at Python/pythonrun.c:1763 #3 Py_Finalize () at Python/pythonrun.c:554 #4 0x08104bac in Py_Exit () at Python/pythonrun.c:1772 #5 handle_system_exit () at Python/pythonrun.c:1146 #6 0x0810517d in PyErr_PrintEx (set_sys_last_vars=) at Python/pythonrun.c:1156 #7 0x081058dd in PyRun_InteractiveOneFlags (fp=0xb7f8b420, filename=0x815f9c0 "", flags=0xbffffc4c) at Python/pythonrun.c:855 #8 0x08105a68 in PyRun_InteractiveLoopFlags (fp=0xb7f8b420, filename=0x815f9c0 "", flags=0xbffffc4c) at Python/pythonrun.c:772 #9 0x081062f2 in PyRun_AnyFileExFlags (fp=0xb7f8b420, filename=0x815f9c0 "", closeit=0, flags=0xbffffc4c) at Python/pythonrun.c:741 #10 0x0805bb59 in Py_Main (argc=1, argv=0xbffffd34) at Modules/main.c:640 #11 0x0805abeb in main (argc=1, argv=0xbffffd34) at ./Modules/python.c:23 (gdb)
msg186628 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:45
After a closer look at recent checkins, I found this checking for the trash can mechanism: 5a2ef447b80d (ticket #13992). This appears to be the cause: 1.20 #define Py_TRASHCAN_SAFE_BEGIN(op) \ 1.21 - if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \ 1.22 - ++_PyTrash_delete_nesting; 1.23 - /* The body of the deallocator is here. */ 1.24 + do { \ 1.25 + PyThreadState *_tstate = PyThreadState_GET(); \ 1.26 + if (_tstate->trash_delete_nesting < PyTrash_UNWIND_LEVEL) { \ 1.27 + ++_tstate->trash_delete_nesting; 1.28 + /* The body of the deallocator is here. */ At the time the Py_AtExit functions are called, the thread state is NULL, so the if (_tstate->...) segfaults.
msg186629 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 13:47
I think a solution to the problem would be to check _tstate for NULL and only use it if it is non-NULL - without threads you don't need to keep track of them ;-)
msg186630 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-04-12 14:00
> At the time the Py_AtExit functions are called, the thread state is NULL I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state. Was it the case in previous versions?
msg186636 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 14:23
On 12.04.2013 16:00, Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> At the time the Py_AtExit functions are called, the thread state is NULL > > I'd say this is the root cause. It's a bad thing to call Py_DECREF without a thread state. > Was it the case in previous versions? The extension has been using this ever since it was written many years ago and without any problems in all Python versions up, but not including 2.7.4. It's the only way to do module cleanup in Python 2.x. So far, we've only seen the problem for dictionaries that are DECREFed. I know that things may go wrong when DECREF'ing objects this late in the finalization process, but still expect Python to handle errors gracefully without segfaulting, as it has been the case for all previous Python versions.
msg186644 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:32
Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point.
msg186645 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:39
By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation code (e.g. __del__ method) will fail running.
msg186646 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-12 15:41
On 12.04.2013 17:32, Antoine Pitrou wrote: > > Judging by the fact that the Py_AtExit funcs are called at the very end of Py_Finalize (after absolutely everything has been cleaned up), I think you shouldn't use any Python facilities at this point. As mentioned earlier: this is the only way to cleanup extension modules in Python 2.x and the trash can patch broke this. > By the way, why do you try to clean up the dictionary at this point? Any non-trivial deallocation code (e.g. __del__ method) will fail running. The dictionaries just contain simple Python types that don't have __del__ methods, so deallocation works fine and it prevents memory leaks when embedding the Python interpreter and restarting it several times.
msg186647 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 15:48
> As mentioned earlier: this is the only way to cleanup extension > modules > in Python 2.x and the trash can patch broke this. Well, the doc clearly says "Since Python’s internal finalization will have completed before the cleanup function, no Python APIs should be called by func". Why wouldn't you clean up the dict (if it exists) at module initialization instead? You will have a working Python runtime at this point. That said, we could hack a 2.7-specific hack to the patch, in order to disable the trashcan mechanism at shutdown.
msg186656 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-12 18:00
Marc-André, does this patch work for you?
msg186915 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-14 11:59
On 12.04.2013 20:00, Antoine Pitrou wrote: > > Marc-André, does this patch work for you? > > Added file: http://bugs.python.org/file29791/tstate_trashcan.patch Thanks, Antoine. I can try this tomorrow.
msg186919 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-04-14 13:26
Yes, let's not break thing in point releases.
msg186975 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-15 11:20
Checked the patch: it fixes the problem. Thanks. Will this go into Python 2.7.5 ? I'm asking because we need to issue a patch level release of egenix-mx-base and if Python 2.7.5 will fix the problem, we'll just add the work-around for Python 2.7.4. Regarding a better cleanup logic: This is available in Python 3.x with the new module init logic and we'll use it there. PS: The approach with doing the cleanup at module startup time won't work, because the still existing objects will reference parts of the already cleaned up interpreter instance.
msg187015 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-15 19:20
New changeset e814fbd470bf by Antoine Pitrou in branch '2.7': Issue #17703: Fix a regression where an illegal use of Py_DECREF() after interpreter finalization can cause a crash. http://hg.python.org/cpython/rev/e814fbd470bf
msg187016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-15 19:21
Committed!
msg187020 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-04-15 19:26
On 15.04.2013 21:21, Antoine Pitrou wrote: > > Committed! Cool, thanks for the quick turnaround.
msg224511 - (view) Author: Guillaume Matte (flex.plexico) Date: 2014-08-01 17:42
It does still cause a Fatal Error in debug build as PyThreadState_GET() error out if _PyThreadState_Current is NULL
msg241412 - (view) Author: Daniel (m_python) Date: 2015-04-18 12:06
Guillaume already mentioned this, its still causing a Fatal Error. To fix this PyThreadState_GET() in Py_TRASHCAN_SAFE_BEGIN must be replaced with _PyThreadState_Current #define Py_TRASHCAN_SAFE_BEGIN(op) \ do { \ PyThreadState *_tstate = _PyThreadState_Current; \
msg248453 - (view) Author: Stian Lode (Stian Lode) Date: 2015-08-12 11:38
I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it. I'm using boost 1.57.0, and Python was compiled on a vanilla rhel6 system.
msg248473 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-08-12 20:46
Can anyone else confirm this bug in 3.4?
msg248531 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-08-13 17:33
> I'm seeing this bug in Python 3.4.2 as well, and the patch here (tstate_trashcan.patch) appears to fix it. What is the context? Some specific C code?
msg332500 - (view) Author: Trevor Joynson (trevorj) Date: 2018-12-25 00:46
I can verify this bug does still in fact exist in all 3.x versions I've tested (3.4-3.7). It's been the cause of many long-standing segfaults we've been having at exit in applications that use OpenRAVE and/or boost::python::numeric.
msg332501 - (view) Author: Trevor Joynson (trevorj) Date: 2018-12-25 00:51
I can also confirm that the included patch does "workaround" the issue. A big pain point here for me is in determining what exit handler is causing this. GDB has been it's usual great help in determining information about the exact segfault, but the problem code itself is rather elusive since it's ran so much earlier and just causes this issue downstream later in execution. Unfortunately the extensions I'm dealing with also happen to take a lifetime to compile, exacerbated by the fact that they are dependent upon one another. Can we please apply this to 3.x's tree?
msg333146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-07 12:09
This issue is closed. Can you please open a new issue? Try to describe the bug that you have and what you attemped to fix it.
msg335478 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-02-13 21:32
See also https://bugs.python.org/issue35983 for another trashcan-related issue.
History
Date User Action Args
2022-04-11 14:57:44 admin set github: 61903
2019-02-13 21:32:31 jdemeyer set nosy: + jdemeyermessages: +
2019-01-07 12:09:18 vstinner set messages: +
2018-12-25 00:51:35 trevorj set messages: +
2018-12-25 00:46:46 trevorj set nosy: + trevorj, ned.deilymessages: + versions: + Python 3.6, - Python 3.4
2017-04-21 16:25:36 vstinner set nosy: + vstinner
2015-08-13 17:33:34 pitrou set messages: +
2015-08-12 20:46:15 larry set messages: +
2015-08-12 11:38:39 Stian Lode set files: + bt.txtversions: + Python 3.4, - Python 2.7nosy: + Stian Lode, larrymessages: +
2015-04-18 12:06:50 m_python set nosy: + m_pythonmessages: +
2014-08-01 17:42:14 flex.plexico set nosy: + flex.plexicomessages: +
2013-04-15 19:26:41 lemburg set messages: +
2013-04-15 19:21:03 pitrou set status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2013-04-15 19:20:26 python-dev set nosy: + python-devmessages: +
2013-04-15 19:13:19 pitrou set type: crashstage: commit review
2013-04-15 19:13:11 pitrou set title: Trash can mechanism segfault during interpreter finalization in Python 2.7.4 -> Trashcan mechanism segfault during interpreter finalization in Python 2.7.4
2013-04-15 11:20:45 lemburg set messages: +
2013-04-14 13:26:43 benjamin.peterson set messages: +
2013-04-14 11:59:43 lemburg set messages: +
2013-04-14 11:40:55 pitrou set priority: normal -> release blockernosy: + benjamin.peterson
2013-04-12 18:00:15 pitrou set files: + tstate_trashcan.patchkeywords: + patchmessages: +
2013-04-12 15:48:44 pitrou set messages: +
2013-04-12 15:41:58 lemburg set messages: +
2013-04-12 15:39:27 pitrou set messages: +
2013-04-12 15:32:09 pitrou set messages: +
2013-04-12 14:23:23 lemburg set messages: +
2013-04-12 14:00:40 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2013-04-12 13:47:32 lemburg set messages: +
2013-04-12 13:45:37 lemburg set messages: +
2013-04-12 13:43:12 lemburg create