msg221180 - (view) |
Author: Steve (tupl) |
Date: 2014-06-21 15:44 |
I am attempting to join a thread after a previous join was interrupted by Ctrl-C. I did not find any warning for this case in threading module docs, so I assume this is legal. The full test script is attached, but the essential code is: def work(t): sleep(t) twork = 3; twait = 4 t = Thread(target=work, args=(twork,)) try: t.start() t.join(twait) # here I do Ctrl-C except KeyboardInterrupt: pass t.join() # this hangs if twork < twait I can observe the following reproduce sequence: 1. start another thread that sleeps (or works) for T seconds 2. join that thread with timeout longer than T 3. before thread finishes and join returns, hit Ctrl-C to raise KeyboardInterrupt (KI) 4. after T seconds, thread finishes its (Python) code and KI is raised 5. Process Explorer confirms that thread really terminates (looked at .ident()) 6. thread still reports .is_alive() == True 7. second attempt to join that thread hangs indefinitely I tried replacing try-except clause with custom signal handler for SIGINT, as shown in the script. If the handler does not raise an exception, the thread can be normally joined. If it does, however, the behavior is the same as with default handler. My _guess_ is that the exception prevents some finishing code that puts Thread into proper stopped state after its target completes. Running Python 3.4.0 on Windows 7 x64 |
|
|
msg223608 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-07-21 20:22 |
This works for me under Linux. |
|
|
msg315761 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-04-25 22:43 |
I've confirmed this issue on Windows. Attached is an SSCCE which seems to reliably reproduce the issue in a Windows environment. Long story short: if a KeyboardInterrupt occurs during Thread.join(), there's a good chance that Thread._is_stopped never gets set. |
|
|
msg315762 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-04-25 22:49 |
Thanks for the update. Since I still can't reproduce on Linux, perhaps you (or one of our resident Windows experts) can try to propose a patch fixing the issue? |
|
|
msg315811 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-04-26 19:04 |
Not sure if I'll do the full fix (need to check w/ my employer), but I'm doing some investigation. Here's what I know so far: At the Python level, the KeyboardInterrupt is being raised within _wait_for_tstate_lock, on "elif lock.acquire(block, timeout)". Going into the C code, it looks like this goes through lock_PyThread_acquire_lock -> acquire_timed -> PyThread_acquire_lock_timed. acquire_timed . lock_PyThread_acquire_lock will abort the lock if it receives PY_LOCK_INTR from acquire_timed. My best guess right now is that PyThread_acquire_lock_timed never returns PY_LOCK_INTR. Indeed, I see this comment at the top of the NT version of that function: /* Fow now, intr_flag does nothing on Windows, and lock acquires are * uninterruptible. */ And indeed, the thread_pthread.h implementations both have a path for returning PY_LOCK_INTR, while the thread_nt.h version does not. ...And that's where I am thus far. |
|
|
msg315815 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2018-04-26 19:32 |
For POSIX systems, try the following test function several times. For the bug to manifest, Thread._wait_for_tstate_lock has to be interrupted in between acquiring and releasing the sentinel lock. Maybe it could use a reentrant lock in order to avoid this problem. import os import signal import threading import time def raise_sigint(): print('raising SIGINT') time.sleep(0.5) if os.name == 'nt': os.kill(0, signal.CTRL_C_EVENT) else: os.kill(os.getpid(), signal.SIGINT) print('finishing') def test(f=raise_sigint): global g_sigint g_sigint = threading.Thread(target=f, name='SIGINT') g_sigint.start() print('waiting') for i in range(100): try: if not g_sigint.is_alive(): break g_sigint.join(0.01) except KeyboardInterrupt: print('KeyboardInterrupt ignored') print('g_sigint is alive:', g_sigint.is_alive()) POSIX-only code normally wouldn't join() with a small timeout in a loop as in the above example. This is a workaround for the problem that's demonstrated in , in which a signal doesn't interrupt a wait on the main thread. Other than time.sleep(), most waits in Windows CPython have not been updated to include the SIGINT Event object when waiting in the main thread. (It's possible to fix some cases such as waiting on locks, but only as long as the implementation continues to use semaphores and kernel waits instead of native condition variables with SRW locks.) |
|
|
msg315966 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-04-30 21:16 |
Focusing on the Windows case specifically... One way to possibly make this work (although perhaps not as clean as may be desired) would be to add polling logic into the thread_nt.h version of PyThread_acquire_lock_timed. That would have the benefit of avoiding the complexity of the various "non recursive mutex" implementations (i.e. semaphores vs "emulated" condition variables vs native condition variables) and may be less code than setting up "alertable" WaitForObjectSignleObjectEx calls (plus whatever else needs to be done for the SRW-lock-based code path). Thoughts or feedback? (I've not done any mainline Python commits yet so I'm totally ready to be completely wrong or misguided here.) |
|
|
msg315967 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-04-30 21:21 |
> One way to possibly make this work (although perhaps not as clean as may be desired) would be to add polling logic into the thread_nt.h version of PyThread_acquire_lock_timed. I think adding polling to such a widely-used primitive is out of question. |
|
|
msg315968 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-04-30 21:32 |
> I think adding polling to such a widely-used primitive is out of question. I'm guessing this is because of the perceived performance impact? (That's the main thought I have against polling in this case.) Or is it something else? I can certainly look at tweaking the 3 mutex implementations I mentioned previously, but I do expect that will be a bit more code; I at least wanted to put the "simpler" idea out there. Fully knowing that "simpler" isn't necessarily better. |
|
|
msg315969 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-04-30 21:39 |
> I'm guessing this is because of the perceived performance impact? The problem is polling is pretty detrimental to power saving on modern CPUs. There might also be a performance impact that makes things worse :-) More generally, we really would like locks to be interruptible under Windows, as it would be useful in many more situations than joining threads. Unfortunately, as you have discovered we have several lock implementations for Windows right now. The SRWLock one is probably difficult to make interruptible, but it's never enabled by default. The Semaphore one looks legacy, so could perhaps be removed. Remains the condition variable-based implementation. |
|
|
msg315970 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-04-30 21:47 |
Ahh, thanks for the explanation. I didn't think about that. Let's *not* do that then. :) I'll see if I can squeeze in some time to play with the alternatives. |
|
|
msg316023 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-05-01 21:50 |
Okay, I think I need to abandon my research into this. This does seem to have quite an amount of complexity to it and is probably more than I should be taking on at this point in time. Anyone else who wants to look at this, consider it fair game. Parting thoughts based on my limited expertise in the area, take them or ignore them: * Semaphore-based AllocNonRecursiveMutex (!_PY_USE_CV_LOCKS): Using WaitForSingleObjectEx with alertable set to TRUE may be one path forward, however it seems like that would involve tracking all threads performing a ctrl-c-interruptible wait and calling QueueUserAPC with a no-op callback for each such thread to cause the wait to terminate early. I don't particularly like the need to roll-our-own tracking, but at least off-hand and based on my somewhat limited experience in this area, I don't know of a better way. Hopefully someone else does. * CriticalSection/Semaphore-based AllocNonRecursiveMutex (_PY_USE_CV_LOCKS with _PY_EMULATED_WIN_CV): I don't know of a way of interrupting the CriticalSection lock, but the WaitForSingleObjectEx strategy may be usable on the semaphore-based condition variable. * SRWLock/ConditionVariable-based AllocNonRecursiveMutex (_PY_USE_CV_LOCKS with !_PY_EMULATED_WIN_CV): Similarly, I don't know of a way to interrupt the SRWLock. However, similar to WaitForSingleObjectEx, it may be possible to wake the condition variables via a Ctrl-C if we add our own tracking for such. I'm not sure if this gets us quite to where we need to be or not. Hopefully the above notes are of some value. |
|
|
msg316024 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2018-05-01 22:08 |
multiprocessing semaphores support Ctrl-C under Windows, so it should be doable for regular locks as well (notice the `sigint_event`): https://github.com/python/cpython/blob/master/Modules/_multiprocessing/semaphore.c#L109-L146 |
|
|
msg316025 - (view) |
Author: Paul Goins (pdgoins-work) |
Date: 2018-05-01 22:24 |
Good point, I forgot about WaitForMultipleObjectsEx; something like that seems like it would be much simpler for the first 2 cases. |
|
|
msg337846 - (view) |
Author: gaborjbernat (gaborjbernat) * |
Date: 2019-03-13 12:33 |
I think I'm hitting this with subprocesses inside tox (parallel feature), any plans to fix this? |
|
|
msg346191 - (view) |
Author: Géry (maggyero) * |
Date: 2019-06-21 08:17 |
I am having the same blocked signal issue on Windows when using Thread.join. This program does not print "interrupted" after pressing Ctrl+C: import threading import time def f(): while True: print("processing") time.sleep(1) if __name__ == "__main__": try: thread = threading.Thread(target=f) thread.start() thread.join() except KeyboardInterrupt: print("interrupted") For reference, 2 years ago Nathaniel Smith gave an interesting explanation here: https://mail.python.org/pipermail/python-dev/2017-August/148800.html. |
|
|
msg402714 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-09-27 14:07 |
> I am having the same blocked signal issue on Windows when using Thread.join. This program does not print "interrupted" after pressing Ctrl+C: This is a different issue: bpo-29971. Currently, threading.Lock.acquire() cannot be interrupted by CTRL+C. |
|
|
msg402717 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-09-27 14:11 |
I mark this issue as a duplicate of bpo-45274. -- I fixed bpo-45274 with this change: New changeset a22be4943c119fecf5433d999227ff78fc2e5741 by Victor Stinner in branch 'main': bpo-45274: Fix Thread._wait_for_tstate_lock() race condition (GH-28532) https://github.com/python/cpython/commit/a22be4943c119fecf5433d999227ff78fc2e5741 I tested join.py with the fix. It nows displays: --- vstinner@DESKTOP-DK7VBIL C:\vstinner\python\main>python x.py Running Debug|x64 interpreter... started. Press Ctrl-C now Ctrl-C [2.99] done=True alive=False finish [2.99] done=True alive=False Terminate batch job (Y/N)? n --- The script no longer hangs. |
|
|