Issue 30768: PyThread_acquire_lock_timed() should recompute the timeout when interrupted by a signal (original) (raw)

Created on 2017-06-26 13:50 by vstinner, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
interrupted_lock.py vstinner,2017-10-24 10:21
test.patch vstinner,2017-10-24 10:30
Pull Requests
URL Status Linked Edit
PR 4103 merged vstinner,2017-10-24 10:28
Messages (10)
msg296896 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-26 13:50
The current code of PyThread_acquire_lock_timed() (the implementation not using semaphore) doesn't compute correctly the timeout when pthread_cond_timedwait() is interrupted by a signal. We should recompute the timeout using a deadline. Something like select.select(): if (tvp) deadline = _PyTime_GetMonotonicClock() + timeout; do { ... use tvp if (errno != EINTR) break; /* select() was interrupted by a signal */ if (PyErr_CheckSignals()) goto finally; if (tvp) { timeout = deadline - _PyTime_GetMonotonicClock(); if (timeout < 0) { n = 0; break; } _PyTime_AsTimeval_noraise(timeout, &tv, _PyTime_ROUND_CEILING); /* retry select() with the recomputed timeout */ } } while (1);
msg296898 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-26 13:51
See also the old bpo-12822: "NewGIL should use CLOCK_MONOTONIC if possible.".
msg296899 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-26 13:51
See also the PEP 475 "Retry system calls failing with EINTR" and PEP 418 (time.monotonic).
msg304892 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 10:21
interrupted_lock.py: test threading.Lock.acquire(timeout=1.0) with SIGALRM sent every 1 ms (so up to 1000 times in total). Example: haypo@selma$ ./python interrupted_lock.py acquire(timeout=1.0) took 1.0 seconds and got 1000 signals Oh, in fact, threading.Lock.acquire(timeout=1.0) already recomputes the timeout when interrupted. In Python stdlib, PyThread_acquire_lock_timed() is only called from one place with intr_flag=0: faulthandler watchdog thread, but this thread blocks all signals: /* we don't want to receive any signal */ sigfillset(&set); pthread_sigmask(SIG_SETMASK, &set, NULL);
msg304893 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 10:28
Ah, I found another caller of PyThread_acquire_lock_timed() with a timeout > 0 and intr_flag=0: _enter_buffered_busy() of Modules/_io/bufferedio.c: /* When finalizing, we don't want a deadlock to happen with daemon * threads abruptly shut down while they owned the lock. * Therefore, only wait for a grace period (1 s.). * Note that non-daemon threads have already exited here, so this * shouldn't affect carefully written threaded I/O code. */ st = PyThread_acquire_lock_timed(self->lock, (PY_TIMEOUT_T)1e6, 0);
msg304895 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 10:30
I wrote PR 4103 to fix the pthread+semaphore implementation of PyThread_acquire_lock_timed(). Apply PR 4103, apply attached test.patch, recompile Python, and run interrupted_lock.py to test the PR. Result: haypo@selma$ ./python interrupted_lock.py acquire(timeout=1.0) took 1.0 seconds and got 911 signals
msg304896 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 10:31
To check if you are using pthread+semaphore, use: haypo@selma$ ./python -c 'import sys; print(sys.thread_info)' sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.25') Here you have pthread+semaphore. It's Fedora 26 running Linux kernel 4.13.
msg304951 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 23:53
New changeset 850a18e03e8f8309bc8c39adc6e7d51a4568cd9a by Victor Stinner in branch 'master': bpo-30768: Recompute timeout on interrupted lock (GH-4103) https://github.com/python/cpython/commit/850a18e03e8f8309bc8c39adc6e7d51a4568cd9a
msg304952 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-25 00:13
I merged my PR. Thanks Antoine Pitrou for the review! This change only impacts the io.BufferedWriter and io.BufferedReader during Python finalization. It has no effect on theading.Lock.acquire(). It might impact faulthandler.dump_traceback_later(), but in practice, it shouldn't change anything since the internal faulthandler watchdog thread blocks all signals. If I understand correctly, if the system clock is stepped back by 1 hour during Python finalization, after PyThread_acquire_lock_timed() computed the deadline, but before sem_timedwait() completed, _enter_buffered_busy() can be blocked during 1 hour. Moving the system clock backward by 1 hour occurs once a year on the DST change. But the race condition is unlikely since the size of the time window is only 1 second. The DST change should occur at Python shutdown when an io.BufferedReader or io.BufferedWriter is used. Since the race condition seems very unlikely and was never reported by another user, I propose to not backward this change. Moreover, I'm not confident to modify locks in a stable release :-)
msg304953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-25 00:15
Oh, the pthread condvar+mutex implementation still has the bug, so I reopen the issue.
History
Date User Action Args
2022-04-11 14:58:48 admin set github: 74953
2017-10-25 00:15:14 vstinner set status: closed -> openresolution: fixed -> messages: +
2017-10-25 00:13:12 vstinner set status: open -> closedresolution: fixedstage: patch review -> resolved
2017-10-25 00:13:03 vstinner set messages: +
2017-10-24 23:53:35 vstinner set messages: +
2017-10-24 10:31:59 vstinner set messages: +
2017-10-24 10:30:17 vstinner set files: + test.patchmessages: +
2017-10-24 10:28:11 vstinner set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest4073>
2017-10-24 10:28:03 vstinner set messages: +
2017-10-24 10:21:07 vstinner set files: + interrupted_lock.pymessages: +
2017-06-26 13:51:48 vstinner set messages: +
2017-06-26 13:51:09 vstinner set messages: +
2017-06-26 13:50:01 vstinner create