Issue 34004: Acquiring locks not interrupted by signals on musl libc (original) (raw)
Created on 2018-06-29 21:43 by Joseph Sible, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (8)
Author: Joseph Sible (Joseph Sible)
Date: 2018-06-29 21:43
When Python is built on Alpine Linux or in any other configuration that uses musl libc, calls to Lock.acquire() can't be interrupted by signals. This bug is caught by test_lock_acquire_interruption and test_rlock_acquire_interruption in Lib/test/test_threadsignals.py, both of which fail when Python is built on musl.
POSIX explicitly says that sem_timedwait ever failing with EINTR is optional: http://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_timedwait.html#tag_16_508_05
And musl deliberately chooses not to support it: http://www.openwall.com/lists/musl/2018/02/24/3 http://git.musl-libc.org/cgit/musl/commit/?id=c0ed5a201b2bdb6d1896064bec0020c9973db0a1
However, we incorrectly rely on it for correct operation. Our documentation https://docs.python.org/3.6/library/threading.html#threading.Lock.acquire just says that "Lock acquires can now be interrupted by signals on POSIX", not that any optional POSIX features are required for it.
A similar bug was #11223, which was the same problem but with pthread_cond_timedwait, which is used when semaphores aren't available.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2018-09-12 20:48
New changeset 5b10d5111d7a855297654af9045f8907b7d3dd08 by Benjamin Peterson in branch 'master': closes bpo-34004: Skip lock interruption tests on musl. (GH-9224) https://github.com/python/cpython/commit/5b10d5111d7a855297654af9045f8907b7d3dd08
Author: miss-islington (miss-islington)
Date: 2018-09-12 21:10
New changeset b608fcd444c00ff37a19d34e4eeadb1221fb6436 by Miss Islington (bot) in branch '3.7': closes bpo-34004: Skip lock interruption tests on musl. (GH-9224) https://github.com/python/cpython/commit/b608fcd444c00ff37a19d34e4eeadb1221fb6436
Author: miss-islington (miss-islington)
Date: 2018-09-12 21:11
New changeset 5a435eac1b83f080c9dfceff0de0d639541e4bcb by Miss Islington (bot) in branch '3.6': closes bpo-34004: Skip lock interruption tests on musl. (GH-9224) https://github.com/python/cpython/commit/5a435eac1b83f080c9dfceff0de0d639541e4bcb
Author: Joseph Sible (Joseph Sible)
Date: 2018-09-12 22:29
How is this considered "fixed"? Why couldn't this be actually fixed by using eventfd instead of semaphores when they're available, for example?
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2018-09-12 22:36
Reimplementing locks with eventfd can be another issue. Such a change can only land in a new Python version, though. We'll just have to consider musl unsupported for interrupting locks in our current maintenance releases as I have done.
How likely is it that musl will change to allow returning EINTR from sem_timedwait given that the only stated reason not to is very old kernel versions?
Author: Joseph Sible (Joseph Sible)
Date: 2018-09-12 23:03
Re musl changing their behavior, see https://www.openwall.com/lists/musl/2018/09/07/1 and the resulting thread.
In addition to the old kernel version issue, two other issues were raised:
- EINTR makes programming mistakes more likely, as people won't think to handle it. I don't give much weight to this point.
- Most of the time, counting on receiving an EINTR results in race conditions. Our code seems to be affected by this too. Even on glibc, a signal at just the "right" time could result in it not being interrupted. This is why I think moving to an eventfd or something would be better, since we could then use pselect/ppoll/etc. to avoid the signal blocking race.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2018-09-12 23:33
On Wed, Sep 12, 2018, at 16:03, Joseph Sible wrote:
- Most of the time, counting on receiving an EINTR results in race conditions. Our code seems to be affected by this too. Even on glibc, a signal at just the "right" time could result in it not being interrupted. This is why I think moving to an eventfd or something would be better, since we could then use pselect/ppoll/etc. to avoid the signal blocking race.
The race is a good point.
However, switching to eventfd is not trivial. It would be the third locking implementation in python just for pthreads. Using eventfd would require reimplementing all the userspace parts of locking (atomics, appropriate memory barriers, and spinning) ourselves. futex is carefully optimized for userspace locking, whereas I've never heard of normal program locking being done with eventfd before.
If the expected usecase for interrupting locks is user pressing C-c, being 99.99999% correct is probably enough.
So, it's fine with me if you open another issue for reimplementing locks with eventfd. But, I don't personally have the desire to do it.
History
Date
User
Action
Args
2022-04-11 14:59:02
admin
set
github: 78185
2018-09-12 23:33:39
benjamin.peterson
set
messages: +
2018-09-12 23:03:35
Joseph Sible
set
messages: +
2018-09-12 22:36:44
benjamin.peterson
set
messages: +
2018-09-12 22:29:46
Joseph Sible
set
messages: +
2018-09-12 21:11:48
miss-islington
set
messages: +
2018-09-12 21:10:59
miss-islington
set
nosy: + miss-islington
messages: +
2018-09-12 20:48:22
miss-islington
set
pull_requests: + <pull%5Frequest8658>
2018-09-12 20:48:15
miss-islington
set
pull_requests: + <pull%5Frequest8657>
2018-09-12 20:48:07
benjamin.peterson
set
status: open -> closed
nosy: + benjamin.peterson
messages: +
resolution: fixed
stage: patch review -> resolved
2018-09-12 19:48:37
benjamin.peterson
set
keywords: + patch
stage: patch review
pull_requests: + <pull%5Frequest8655>
2018-06-29 21:43:37
Joseph Sible
create