Issue 8411: New GIL: improve condition variable emulation on NT (original) (raw)

Created on 2010-04-15 20:35 by kristjan.jonsson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
nt_cond.patch kristjan.jonsson,2010-04-15 20:35
nt_cond2.patch kristjan.jonsson,2010-08-08 20:43 Updated version of the patch
Messages (12)
msg103251 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-04-15 20:35
As per Antoine's suggestion, here is a patch to improve condition variable emulation on windows. By using the windows Semaphore (which hasn't always been available) all of the problems in emulating condition variables using Events disappear. in particular, the "lost wakeup" bug can be elimintated, and the same object easily supports bot the "signal" and "broadcast" mode of waking up threads. We can also use the CRICITAL_SECTION objects as the associated mutex which is much lighter weight than using a kernel object. I do think I´ve caught all the corner cases. It is also simpler to prove because this construct is simpler than the previous one. If, however, i have missed a spot, I'd be interested to know about it. Not completely impossible since race conditions are devious beasts. Ive tested this extensively on Windows. See also issue 8299 where this implementation was presented.
msg113029 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 19:29
This patches ceval_gil.h. Can these changes be unit tested? Or is this intended to be an internal performance, invisible-to-the-user, code improvement? Antoine, are you going to look at this?
msg113089 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-06 09:50
The latter. But I'd really like Antoine to look this over.
msg113301 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-08 19:41
Questions: - why does _cond_timed_wait() decrement n_waiting *twice*? - why does _cond_timed_wait() use InterlockedDecrement()? it doesn't protect against regular accesses to the same memory location, since it isn't guarded by the mutex - why don't you simply pass NULL as the third parameter to ReleaseSemaphore() in _cond_signal()? - I don't understand what you call "possible race condition" in _cond_signal(). _cond_signal() is currently always called with the corresponding mutex held, by the way. We could add a comment to make it mandatory.
msg113303 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-08 19:45
Oh, and I would call CreateSemaphore() with a max count of 1, precisely to guard us against bugs.
msg113314 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:27
Thanks for your comments. The problem with _cond_timed_wait indicates that the timed wait code hadn't been thoroughly tested at the time of submission. 1) The InterlockedDecrement was left in by mistake. It must be removed 2) The LONG was argument was a vestige from trying to prevent the benign race condition with the _cond_timed_wait. As for the race condition, it is as follows: a) Thread A is waiting in _cond_timed_wait(), n_waiting == 1 b) The wait times out and it tries to acquire the lock to decrement n_waiting. This is a critical lock, because no one is waiting for the semaphore, yet n_waiting == 1 c) Thread B, that has the lock, calls _cond_signal(). It still sees n_waiting == 1 and does a ReleaseSemaphore. It then also decrements n_waiting, and n_waiting is now 0. The Semaphore count is now 1 d) Thread A finally gets the lock and decrements n_waiting. n_waiting is now -1. n_waiting has thus been decremented twice. But this is ok, since there was also an extra ReleaseSemaphore call and the semaphore count is now 1. The upshot of this is that this race condition is benign, and what happens is that the next time someone does _cond_wait(), the semaphore is reduced to 0, n_waiting goes to 0, and the thread passes right through. Thus, should this event happen, the only upshot of it is that one _cond_wait call will be ineffective. I'll upload a new patch with better explanation.
msg113315 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:28
"this is a critical lock" should read "this is a critical moment" :)
msg113318 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:43
Updated patch file
msg113400 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-09 08:40
You wan't to have the Semaphore with a max count of infinite because a) of the "race condition" mentioned (actually, possible discrepancy between n_waiting and actual semaphore value), which can cause the semaphore count to temporarily go positive, and b) If you implement _cond_broadcast(), which would have code like: int waiting = cond->n_waiting; if (waiting > 0) { cond->n_waiting = 0 ReleaseSemaphore(cond->sem, waiting, 0); } The semaphore value going above 0 is not a "bug" but an implementation detail stemming from the fact that n_waiting cannot reliably reflect the semaphore state at each particular time. Due to locking, it may lag behind the state a little bit.
msg113540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-10 13:50
Thanks Kristján, I've committed the patch with modified comments in r83932. Unfortunately, the Windows buildbots are in a wreck, so this won't be checked immediately (I only have a single-core XP VM under which the patch ran fine).
msg113559 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-10 21:24
Great. I hope you agree that it is simpler. I'm afraid my explanations can be somewhat long-winded so I hope you found a better way to document my "pseudo race condition".
msg113783 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-13 15:13
Closing as fixed, since nothing strange seems to have appeared so far.
History
Date User Action Args
2022-04-11 14:56:59 admin set github: 52658
2010-08-13 15:13:16 pitrou set status: open -> closedkeywords:patch, patch, needs reviewresolution: accepted -> fixedmessages: +
2010-08-10 21:24:42 kristjan.jonsson set keywords:patch, patch, needs reviewstatus: pending -> openmessages: +
2010-08-10 15:17:34 ocean-city set status: open -> pendingnosy: - ocean-citykeywords:patch, patch, needs review
2010-08-10 15:16:57 ocean-city set status: pending -> openmessages: -
2010-08-10 14:54:21 ocean-city set status: open -> pendingkeywords:patch, patch, needs review
2010-08-10 14:53:28 ocean-city set status: pending -> opennosy: + ocean-citymessages: + keywords:patch, patch, needs review
2010-08-10 13:50:24 pitrou set status: open -> pendingassignee: pitroutitle: Improve condition variable emulation on NT -> New GIL: improve condition variable emulation on NTkeywords:patch, patch, needs reviewresolution: acceptedmessages: + stage: patch review -> resolved
2010-08-09 08:40:17 kristjan.jonsson set keywords:patch, patch, needs reviewmessages: +
2010-08-08 20:43:09 kristjan.jonsson set keywords:patch, patch, needs reviewfiles: + nt_cond2.patchmessages: +
2010-08-08 20:28:24 kristjan.jonsson set keywords:patch, patch, needs reviewmessages: +
2010-08-08 20:27:29 kristjan.jonsson set keywords:patch, patch, needs reviewmessages: +
2010-08-08 19:45:59 pitrou set keywords:patch, patch, needs reviewmessages: +
2010-08-08 19:41:06 pitrou set keywords:patch, patch, needs reviewnosy: + jyasskinmessages: +
2010-08-08 16:05:07 pitrou set keywords:patch, patch, needs reviewnosy: + tim.golden, brian.curtin
2010-08-07 07:37:08 ocean-city unlink issue9116 dependencies
2010-08-06 10:50:04 ocean-city link issue9116 dependencies
2010-08-06 09:50:04 kristjan.jonsson set keywords:patch, patch, needs reviewmessages: +
2010-08-05 19:29:31 terry.reedy set nosy: + terry.reedymessages: + keywords:patch, patch, needs reviewtype: performancestage: patch review
2010-04-15 20:35:42 kristjan.jonsson create