msg103251 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-04-15 20:35 |
As per Antoine's suggestion, here is a patch to improve condition variable emulation on windows. By using the windows Semaphore (which hasn't always been available) all of the problems in emulating condition variables using Events disappear. in particular, the "lost wakeup" bug can be elimintated, and the same object easily supports bot the "signal" and "broadcast" mode of waking up threads. We can also use the CRICITAL_SECTION objects as the associated mutex which is much lighter weight than using a kernel object. I do think I´ve caught all the corner cases. It is also simpler to prove because this construct is simpler than the previous one. If, however, i have missed a spot, I'd be interested to know about it. Not completely impossible since race conditions are devious beasts. Ive tested this extensively on Windows. See also issue 8299 where this implementation was presented. |
|
|
msg113029 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2010-08-05 19:29 |
This patches ceval_gil.h. Can these changes be unit tested? Or is this intended to be an internal performance, invisible-to-the-user, code improvement? Antoine, are you going to look at this? |
|
|
msg113089 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-06 09:50 |
The latter. But I'd really like Antoine to look this over. |
|
|
msg113301 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-08 19:41 |
Questions: - why does _cond_timed_wait() decrement n_waiting *twice*? - why does _cond_timed_wait() use InterlockedDecrement()? it doesn't protect against regular accesses to the same memory location, since it isn't guarded by the mutex - why don't you simply pass NULL as the third parameter to ReleaseSemaphore() in _cond_signal()? - I don't understand what you call "possible race condition" in _cond_signal(). _cond_signal() is currently always called with the corresponding mutex held, by the way. We could add a comment to make it mandatory. |
|
|
msg113303 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-08 19:45 |
Oh, and I would call CreateSemaphore() with a max count of 1, precisely to guard us against bugs. |
|
|
msg113314 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-08 20:27 |
Thanks for your comments. The problem with _cond_timed_wait indicates that the timed wait code hadn't been thoroughly tested at the time of submission. 1) The InterlockedDecrement was left in by mistake. It must be removed 2) The LONG was argument was a vestige from trying to prevent the benign race condition with the _cond_timed_wait. As for the race condition, it is as follows: a) Thread A is waiting in _cond_timed_wait(), n_waiting == 1 b) The wait times out and it tries to acquire the lock to decrement n_waiting. This is a critical lock, because no one is waiting for the semaphore, yet n_waiting == 1 c) Thread B, that has the lock, calls _cond_signal(). It still sees n_waiting == 1 and does a ReleaseSemaphore. It then also decrements n_waiting, and n_waiting is now 0. The Semaphore count is now 1 d) Thread A finally gets the lock and decrements n_waiting. n_waiting is now -1. n_waiting has thus been decremented twice. But this is ok, since there was also an extra ReleaseSemaphore call and the semaphore count is now 1. The upshot of this is that this race condition is benign, and what happens is that the next time someone does _cond_wait(), the semaphore is reduced to 0, n_waiting goes to 0, and the thread passes right through. Thus, should this event happen, the only upshot of it is that one _cond_wait call will be ineffective. I'll upload a new patch with better explanation. |
|
|
msg113315 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-08 20:28 |
"this is a critical lock" should read "this is a critical moment" :) |
|
|
msg113318 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-08 20:43 |
Updated patch file |
|
|
msg113400 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-09 08:40 |
You wan't to have the Semaphore with a max count of infinite because a) of the "race condition" mentioned (actually, possible discrepancy between n_waiting and actual semaphore value), which can cause the semaphore count to temporarily go positive, and b) If you implement _cond_broadcast(), which would have code like: int waiting = cond->n_waiting; if (waiting > 0) { cond->n_waiting = 0 ReleaseSemaphore(cond->sem, waiting, 0); } The semaphore value going above 0 is not a "bug" but an implementation detail stemming from the fact that n_waiting cannot reliably reflect the semaphore state at each particular time. Due to locking, it may lag behind the state a little bit. |
|
|
msg113540 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-10 13:50 |
Thanks Kristján, I've committed the patch with modified comments in r83932. Unfortunately, the Windows buildbots are in a wreck, so this won't be checked immediately (I only have a single-core XP VM under which the patch ran fine). |
|
|
msg113559 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2010-08-10 21:24 |
Great. I hope you agree that it is simpler. I'm afraid my explanations can be somewhat long-winded so I hope you found a better way to document my "pseudo race condition". |
|
|
msg113783 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-08-13 15:13 |
Closing as fixed, since nothing strange seems to have appeared so far. |
|
|