Issue 1856: shutdown (exit) can hang or segfault with daemon threads running (original) (raw)
Created on 2008-01-17 02:01 by gregory.p.smith, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (32)
Author: Gregory P. Smith (gregory.p.smith) *
Date: 2008-01-17 02:01
This probably applies to 3.0 as well but i have not tested it there.
Here are some sample failures:
=========== A ============== Exception in thread Thread-0000000001 (most likely raised during interpreter shutdown):Exception in thread Thread-0000000003 (most likely raised during interpreter shutdown):
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xe4cf1bb0 (LWP 22281)] 0x080c0824 in PyEval_EvalFrameEx (f=0x820d59c, throwflag=0) at Python/ceval.c:2633 2633 if (tstate->frame->f_exc_type != NULL)
=========== B ============== Exception in thread Thread-0000000001 (most likely raised during interpreter shutdown): Traceback (most recent call last): File "/home/gps/oss/python/trunk/Lib/threading.py", line 486, in __bootstrap_inner File "thread_exit.py", line 24, in run <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'add'
Program received signal SIGINT, Interrupt. [Switching to Thread 0xf7fe08e0 (LWP 21703)] 0xffffe410 in __kernel_vsyscall () (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0x4dac1af0 in sem_wait@GLIBC_2.0 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0xf7fe08c0 in ?? () #3 0x080ec9d5 in PyThread_acquire_lock (lock=0x0, waitflag=1) at Python/thread_pthread.h:349 #4 0x080be6d5 in PyEval_RestoreThread (tstate=0x8168050) at Python/ceval.c:314 #5 0x0806caf5 in file_dealloc (f=0xf7f29d58) at Objects/fileobject.c:396 #6 0x0810be36 in frame_dealloc (f=0x8201f04) at Objects/frameobject.c:416 #7 0x080e28d2 in PyThreadState_Clear (tstate=0x8184068) at Python/pystate.c:217 #8 0x080e2994 in PyInterpreterState_Clear (interp=0x8168008) at Python/pystate.c:105 #9 0x080e473b in Py_Finalize () at Python/pythonrun.c:444 #10 0x080e428e in handle_system_exit () at Python/pythonrun.c:1641 #11 0x080e447a in PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:1087 #12 0x080e510d in PyRun_SimpleFileExFlags (fp=0x817bd98, filename=0xffffd73c "thread_exit.py", closeit=1, flags=0xffffd518) at Python/pythonrun.c:996 #13 0x0805753b in Py_Main (argc=1, argv=0xffffd5b4) at Modules/main.c:574 #14 0x4d95cea2 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6 #15 0x080566c1 in _start () at ../sysdeps/i386/elf/start.S:119 (gdb) bt #0 0x080c0824 in PyEval_EvalFrameEx (f=0x820d59c, throwflag=0) at Python/ceval.c:2633 #1 0x080c451c in PyEval_EvalFrameEx (f=0x820d40c, throwflag=0) at Python/ceval.c:3690 #2 0x080c451c in PyEval_EvalFrameEx (f=0x820d13c, throwflag=0) at Python/ceval.c:3690 #3 0x080c5fd4 in PyEval_EvalCodeEx (co=0xf7f22d58, globals=0xf7f962d4, locals=0x0, args=0xf7f46e18, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2876 #4 0x0810d4d6 in function_call (func=0xf7f41454, arg=0xf7f46e0c, kw=0x0) at Objects/funcobject.c:524 #5 0x0805b0b4 in PyObject_Call (func=0xf7f41454, arg=0x0, kw=0x0) at Objects/abstract.c:1872 #6 0x08061008 in instancemethod_call (func=0xf7f41454, arg=0xf7f46e0c, kw=0x0) at Objects/classobject.c:2509 #7 0x0805b0b4 in PyObject_Call (func=0xf7f3fdec, arg=0x0, kw=0x0) at Objects/abstract.c:1872 #8 0x080be8f9 in PyEval_CallObjectWithKeywords (func=0x0, arg=0xf7fa002c, kw=0x0) at Python/ceval.c:3473 #9 0x080efb4d in t_bootstrap (boot_raw=0x820c910) at ./Modules/threadmodule.c:422 #10 0x4dabd341 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #11 0x4da114ee in clone () from /lib/tls/i686/cmov/libc.so.6
And as with all problems of this sort... sometimes the program exits normally without any problems.
I ran python trunk:60012 under gdb above. But these problems occur on older versions such as 2.4.
Author: Guido van Rossum (gvanrossum) *
Date: 2008-01-17 04:24
So can we definitely rule out that this could be caused by the recent changes to threading.py (r57216)?
Author: Gregory P. Smith (gregory.p.smith) *
Date: 2008-01-17 07:06
Yes i believe it is unrelated to any recent change.
I can reproduce both behaviors on my OS X 10.4 dual core mac using Python 2.5.1.
Python 2.3 on the mac appears to get stuck in a loop when run stand alone but gets a memory access fault when run under gdb.
python 2.4.4 seems to hang most of the time.
(all behaviors are possible i expect, i just ran it by hand under several versions a few times)
The systems i ran it on when reporting the bug was SMP. As with many threading bugs it might be easier to reproduce on SMP systems but i haven't checked.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-01-17 20:44
I uploaded a script for a similar issue: http://bugs.python.org/issue1596321 (thread_crash.py) which sometimes segfaults on Windows XP (trunk version, debug build), with the same call stacks as printed by Gregory, on both threads.
I explain it this way: On interpreter shutdown, the main thread clears the other's thread TreadState. There you find the instruction: Py_CLEAR(tstate->frame); But this can call arbitrary python code on deallocation of locals, and release the GIL (file.close() in our case). The other thread can then continue to run. If it is given enough processor time before the whole process shutdowns, it will reach this line in ceval.c (line 2633) if (tstate->frame->f_exc_type != NULL) since tstate->frame has been cleared by the main thread ==> boom.
There can be several ways to solve this problem:
- While clearing a thread state, prevent the "owner" thread to gain focus. Only the main thread can "Py_END_ALLOW_THREADS". The daemon thread is blocked forever, and will die there.
- Be smarter when clearing tstate->frame: first call frame_clear(), which will free local variables, but let the frame in a valid state. Only after that, call Py_CLEAR(tstate->frame), when we are sure that no more thread switch can occur. Of course there are many other fields in a frame; we have to inspect them carefully. The first solution seems a more robust approach.
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-01-17 21:00
I think non-main threads should kill themselves off if they grab the interpreter lock and the interpreter is tearing down. They're about to get killed off anyway, when the process exits.
PyGILState_Ensure would still be broken. It touches various things that get torn down (autoInterpreterState, autoTLSkey, and the HEAD_LOCK macros) before it grabs the GIL. Reordering should be possible of course.
Author: Gregory P. Smith (gregory.p.smith) *
Date: 2008-01-17 21:32
agreed, during shutdown the other threads should be stopped. anything to do this complicates acquiring and releasing the GIL by adding another check to see if we're shutting down.
brainstorm: I haven't looked at the existing BEGIN_ALLOW_THREADS and END_ALLOW_THREADS implementations but would it be possible to modify them on the fly from the thread doing the shutdown (main or not) while it holds the GIL such that all future calls to BEGIN_ALLOW_THREADS do not actually release the GIL and END_ALLOW_THREADS always blocks. That should bring other threads to a halt pretty quickly and prevent destructors from releasing the GIL (file.close, etc).
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-01-17 22:35
I'm not sure I understand you, Gregory. Are arguing in favour of adding extra logic to the GIL code, or against it?
I'm attaching a patch that has non-main thread exit, and it seems to fix the test case. It doesn't fix the PyGILState_Ensure problems though.
Also note that PyThread_exit_thread() was completely broken, becoming a no-op once threading was initialized, and even before then it would exit the process rather than just the thread. I've fixed it for pthreads, but not for any of the other platforms.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-01-17 22:39
PyGILState_Ensure would still be broken.
It touches various things that get torn down (autoInterpreterState, autoTLSkey, and the HEAD_LOCK macros) before it grabs the GIL. Reordering should be possible of course.
Adam, did you notice the change on revision 59231 ? the PyGILState_Ensure stuff should now remain valid during the PyInterpreterState_Clear() call.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-01-17 22:54
Adam, your patch cover one case of the thread releasing the GIL (Py_ALLOW_THREADS), but there are more calls to PyThread_acquire_lock (the checkinterval for example).
I have a competing patch: it makes the main thread never release the GIL after some point. Other threads stay blocked and don't clean themselves: the OS will take care of them.
Both approaches correct the initial problem, though. A remaining question is where exactly in Py_Finalize() we should ban other threads. I think that this point should be as late as possible, to allow some object/resources to be cleaned properly.
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-01-17 23:04
Adam, did you notice the change on revision 59231 ? the PyGILState_Ensure stuff should now remain valid during the PyInterpreterState_Clear() call.
That doesn't matter. PyGILState_Ensure needs to remain valid forever. Only once the process is completely gone can we be sure it won't be called.
Note that PyGILState_Ensure has two behaviours: it can be called when your thread is already running python code and has a tstate, or it can be called when you have none. revision 59231 fixed the former, but only when Py_Finalize hasn't finished. It doesn't fix it for any daemon threads, or for dummy threads (created outside of python's control).
...
You're right, I did forget the 3 other places that acquire the interpreter_lock. The more I think about it the more I think our two approaches are equivalent, but conditionalizing the release means we don't need to touch the broken PyThread_exit_thread functions.
I think the banning should be as early as possible, right after call_sys_exitfunc() has finished. You can't do anything sane once the interpreter is half-gone (despite the precedent of trying anyway.)
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-01-17 23:23
That doesn't matter. PyGILState_Ensure needs to remain valid forever. Only once the process is completely gone can we be sure it won't be called.
We could apply the same idea: when exiting, PyGILState_Ensure() blocks forever, except for the main thread of course.
Note that all this state must be restartable: after Py_Finalize(), it should be possible to call Py_Initialize() again. This seems to raise the score of the "exit_thread" approach. I don't know if multiple interpreters are well supported, though. Is there somewhere a list of use cases, or a test script that can exercise this?
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-01-17 23:34
PyGILState_Ensure WOULD block forever if it acquired the GIL before doing anything else.
The only way to make Py_Initialize callable after Py_Finalize is to make various bits of the finalization into no-ops. For instance, it's currently impossible to unload C extension modules, so they must be left around permanently. I'm not convinced it works currently, or that there's use cases for it.
Note that unloading python.so between Py_Finalize and Py_Initialize would definitely break things, as you'd lose all the global variables maintaining things. That eliminates the only "it's more elegant" argument.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-01-17 23:51
PyGILState_Ensure WOULD block forever if it acquired the GIL before doing anything else. Is it possible at all? PyThread_acquire_lock is not reentrant.
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-01-18 00:12
Hrm. It seems you're right. Python needs thread-local data to determine if the GIL is held by the current thread. Thus, autoTLSkey and all that need to never be torn down. (The check could be done much more directly than the current PyThreadState_IsCurrent machinations, but I digress.)
Author: Adam Olsen (Rhamphoryncus)
Date: 2008-05-07 18:11
Cleaned up version of Amaury's patch. I stop releasing the GIL after sys.exitfunc is called, which protects threads from the ensuing teardown.
I also grab the import lock (and never release it). This should prevent the nasty issue with daemon threads doing imports mentioned in issue 1720705, so that part of the documentation change can be removed. I didn't do any significant testing though.
Importing raises a potential issue with this approach. The standard meme of "release GIL; grab lock; acquire GIL" becomes invalid. A child may grab the lock, then block on the GIL, while the main thread (which never releases the GIL) will block on acquiring the lock. I worked around it with the import lock due to it's recursive behaviour, but if it exists anywhere else in CPython it may cause a problem if used during shutdown.
Author: Georg Brandl (georg.brandl) *
Date: 2008-05-12 17:11
Closed #2077 as a duplicate.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-09-24 15:47
The threads don't have to be daemons: sys.exit() calls Py_Finalize() without waiting for non-daemons threads.
Author: Gabriel Genellina (ggenellina)
Date: 2009-02-09 20:30
I think applying Rhamphoryncus' patch in #1722344 fixes this too (that is, move WaitForThreadShutdown from the end of PyMain into Py_Finalize, so it is always called) But it should be tested on several platforms.
Author: Terry J. Reedy (terry.reedy) *
Date: 2010-08-03 18:09
Is this still an issue in 2.7 or 3.x? Or should it be closed?
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-01 19:53
It is certainly still a problem with 3.x, but I don't find a way to exhibit it here.
Author: Antoine Pitrou (pitrou) *
Date: 2010-09-20 11:55
See #9901 for a variation on the same global issue (running threads can access interpreter structures - the GIL - while the main thread is trying to destroy them).
Author: Antoine Pitrou (pitrou) *
Date: 2011-05-02 20:36
With Python 3.3, thread_exit.py still crashes more or less randomly:
$ ./python thread_exit.py [49205 refs] python: Python/_warnings.c:501: setup_context: Assertion `((((((PyObject*)(*filename))->ob_type))->tp_flags & ((1L<<28))) != 0)' failed. Abandon
Author: Antoine Pitrou (pitrou) *
Date: 2011-05-02 20:49
A patch that seems to work under Linux and Windows (for 3.2/3.3).
Author: Antoine Pitrou (pitrou) *
Date: 2011-05-03 17:25
Victor pointed out that Py_Finalize() is not necessarily called in the main Python thread. This new patch records the thread state of the finalizing thread, and also includes a test case.
Author: Roundup Robot (python-dev)
Date: 2011-05-04 18:05
New changeset 2a19d09b08f8 by Antoine Pitrou in branch '3.2': Issue #1856: Avoid crashes and lockups when daemon threads run while the http://hg.python.org/cpython/rev/2a19d09b08f8
New changeset c892b0321d23 by Antoine Pitrou in branch 'default': Issue #1856: Avoid crashes and lockups when daemon threads run while the http://hg.python.org/cpython/rev/c892b0321d23
Author: Antoine Pitrou (pitrou) *
Date: 2011-05-04 18:06
Should be fixed in 3.2 and 3.3 now. I don't really want to bother with 2.7 and 3.1 (the GIL implementation is different), but someone can backport the patch if they want to :)
Author: Thomas Guettler (guettli) *
Date: 2013-08-07 09:21
There are some examples to work around this for Python2: http://stackoverflow.com/questions/18098475/detect-interpreter-shut-down-in-daemon-thread
Author: Roundup Robot (python-dev)
Date: 2014-06-17 02:39
New changeset 7741d0dd66ca by Benjamin Peterson in branch '2.7': avoid crashes and lockups from daemon threads during interpreter shutdown (#1856) http://hg.python.org/cpython/rev/7741d0dd66ca
Author: Matthias Klose (doko) *
Date: 2014-07-11 17:44
http://tracker.ceph.com/issues/8797 reports that the backport to 2.7 causes a regression in ceph.
Author: Ned Deily (ned.deily) *
Date: 2014-07-11 21:08
I've opened Issue21963 to track the 2.7.8 regression. Please continue any discussion there.
Author: Roundup Robot (python-dev)
Date: 2014-11-21 01:05
New changeset 4ceca79d1c63 by Antoine Pitrou in branch '2.7': Issue #21963: backout issue #1856 patch (avoid crashes and lockups when https://hg.python.org/cpython/rev/4ceca79d1c63
Author: STINNER Victor (vstinner) *
Date: 2020-03-26 23:05
bpo-1193099 was marked as a duplicate of this issue.
History
Date
User
Action
Args
2022-04-11 14:56:29
admin
set
github: 46164
2020-03-26 23:05:06
vstinner
set
messages: +
2014-11-21 01:05:23
python-dev
set
messages: +
2014-07-11 21:08:57
ned.deily
set
status: open -> closed
nosy: + ned.deily
messages: +
2014-07-11 17:44:14
doko
set
status: closed -> open
nosy: + doko, benjamin.peterson
messages: +
2014-06-17 02:39:44
python-dev
set
messages: +
2013-08-07 09:21:03
guettli
set
nosy: + guettli
messages: +
2011-05-04 18:06:56
pitrou
set
status: open -> closed
versions: - Python 3.1, Python 2.7
messages: +
resolution: fixed
stage: patch review -> resolved
2011-05-04 18:05:05
python-dev
set
nosy: + python-dev
messages: +
2011-05-03 17:25:36
pitrou
set
files: + finalizing2.patch
messages: +
2011-05-02 22:09:58
Trundle
set
nosy: + Trundle
2011-05-02 20:49:53
pitrou
set
files: + finalizing.patch
nosy: + vstinner
messages: +
stage: patch review
2011-05-02 20:36:31
pitrou
set
messages: +
versions: + Python 3.3
2010-09-20 11:55:36
pitrou
set
messages: +
2010-09-01 21:21:16
gvanrossum
set
nosy: - gvanrossum
2010-09-01 19:53:25
pitrou
set
nosy: + pitrou
messages: +
versions: + Python 3.1, Python 3.2
2010-08-03 18:09:04
terry.reedy
set
nosy: + terry.reedy
messages: +
versions: + Python 2.7, - Python 2.6, Python 2.5
2010-07-01 09:28:57
Netto
set
nosy: + Netto
2010-06-09 21:25:35
terry.reedy
set
versions: - Python 2.4
2010-04-13 17:24:58
jamescooper
set
nosy: + jamescooper
2009-06-03 14:38:27
flub
set
nosy: + flub
2009-02-09 20:30:28
ggenellina
set
nosy: + ggenellina
messages: +
2009-02-08 14:58:03
ocean-city
link
2008-12-12 02:03:53
forest
set
nosy: + forest
2008-09-24 15:47:56
amaury.forgeotdarc
set
messages: +
2008-05-12 17:11:21
georg.brandl
link
2008-05-12 17:11:14
georg.brandl
set
nosy: + georg.brandl
messages: +
2008-05-07 18:11:54
Rhamphoryncus
set
files: + python-thread_noswap-2.diff
keywords: + patch
messages: +
2008-04-29 23:52:23
larsimmisch
set
nosy: + larsimmisch
2008-02-18 21:17:16
kbk
set
nosy: + kbk
2008-01-20 19:57:56
christian.heimes
set
priority: high
2008-01-18 00:12:27
Rhamphoryncus
set
messages: +
2008-01-17 23:51:13
amaury.forgeotdarc
set
messages: +
2008-01-17 23:34:31
Rhamphoryncus
set
messages: +
2008-01-17 23:23:38
amaury.forgeotdarc
set
messages: +
2008-01-17 23:04:41
Rhamphoryncus
set
messages: +
2008-01-17 22:54:39
amaury.forgeotdarc
set
files: + thread_noswap.patch
messages: +
2008-01-17 22:39:20
amaury.forgeotdarc
set
messages: +
2008-01-17 22:35:35
Rhamphoryncus
set
files: + python-trunk-issue1856-threadexit.diff
messages: +
2008-01-17 21:32:19
gregory.p.smith
set
messages: +
2008-01-17 21:00:48
Rhamphoryncus
set
nosy: + Rhamphoryncus
messages: +
2008-01-17 20:44:14
amaury.forgeotdarc
set
nosy: + amaury.forgeotdarc
messages: +
2008-01-17 07:06:03
gregory.p.smith
set
messages: +
2008-01-17 04:24:12
gvanrossum
set
nosy: + gvanrossum
messages: +
2008-01-17 02:01:14
gregory.p.smith
create