msg239611 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-03-30 13:59 |
I have a pattern where I read from a queue with a timeout, generally like this: while True: reader = asyncio.async(wait_for(queue.get(), 0.1)) try: item = (yield from reader) except asyncio.TimeoutError: reader.cancel() continue This is to have a loop where we try to get items from the queue with a timeout. Prior to Python 3.5, wait_for() doesn't automatically cancel the reading task, so I have to do it explicitly above. The code has a strange race condition where, if a taks calls put_nowait on a reader task that is just about to be cancelled due to timeout, then the item that wast put onto the queue gets lost. In the tests framework, the minimal test case I can come up with is mainly this code: q = asyncio.Queue(loop=loop) reader = loop.create_task(q.get()) loop.run_until_complete(asyncio.sleep(0.01, loop=loop)) q.put_nowait(1) q.put_nowait(2) reader.cancel() When the reader gets cancelled, the item `1` is lost into the ether, and when you create another reader for the same queue, it only gets the second item `2`. I would expect that, if a reader task is cancelled, the item at the head of the queue doesn't get lost. Either the reader succeeds and the item is removed, or it doesn't succeed and the item is left alone. I attach a patch to the tulip tests that reproduces the problem in both Python 3.4.2 and tulip hg. |
|
|
msg239624 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-03-30 17:06 |
Looks like a valid bug report, I like the test you provided, and the fix seems on the right track. Comments on the fix: - I'd really like to see a rietveld diff for both patches. - Are there other places where a cancellation can have a similar effect? Maybe the same logic in put()? - The fix uses waiter.result() but if waiter itself is cancelled this will raise a CancelledError. This is probably fine because in that case no value has been put into it. But now this got me thinking -- how can the yield-from in get() receive a CancelledError without the waiter being cancelled? IOW how does the fix actually work??? (Maybe because the generator has already returned when the Task wrapping it is cancelled?) |
|
|
msg239625 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-03-30 17:20 |
I created a codereview issue: https://codereview.appspot.com/222930043 |
|
|
msg239627 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-03-30 17:38 |
> - Are there other places where a cancellation can have a similar effect? Maybe the same logic in put()? Hm.. I didn't look, but yes, it does look like it might be affected by the same issue. I'll try to create a test for that to confirm. > how can the yield-from in get() receive a CancelledError without the waiter being cancelled? Well, it definitely gets a CancelledError in the return (yield from waiter). I added extensive logging and could confirm this is the case. What happens is that 1. Test creates a task q.get() 2. The task coroutine, inside Queue.get, blocks on the yield from waiter 3. a couple of items are put in the queue 4. Test code calls task.cancel() (where `task` is running the coroutine q.get) 5. The Queue.get task receives a CancelledError because it was cancelled. The exception appears to come from the `yield from waiter` only because this is the place where the Queue.get task was suspended, but waiter itself is not cancelled. |
|
|
msg239628 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-03-30 17:44 |
Make sense. I'll be waiting for your updated patch. Thanks for both the bug report and the fix! |
|
|
msg239679 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-03-31 10:22 |
So I uploaded a new patch version fixing a similar problem in put(). |
|
|
msg239738 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-03-31 17:17 |
I'm sorry, I don't have time to review this (and it's subtle enough that I don't want to approve it without understanding). Maybe Victor understands? |
|
|
msg240006 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-04-03 16:01 |
queue_bug.py: script to reproduce the bug. I confirm that Queue.get() sometimes looses items when it is cancelled. The waiter contains the result, but the waiter is lost when get() is cancelled. Queue.get() waiter got a result, but Queue.get() wakeup is only *scheduled* yet. It may even be executed in the same iteration, or maybe in the next loop iteration. I didn't review patches yet. |
|
|
msg246762 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-07-15 13:31 |
Don't know if it helps, but I made a github pull request for this: https://github.com/python/asyncio/pull/256 |
|
|
msg248055 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-08-05 17:59 |
New changeset 7aa2d3e1c885 by Yury Selivanov in branch '3.4': Issue #23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/7aa2d3e1c885 New changeset d5644d7e222d by Yury Selivanov in branch '3.5': Issue #23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/d5644d7e222d New changeset 8f581da70ccd by Yury Selivanov in branch 'default': Issue #23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/8f581da70ccd |
|
|
msg248056 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-05 17:59 |
The fix is committed. Closing the issue. Thanks a lot, Gustavo! |
|
|
msg248137 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-08-06 18:04 |
New changeset 024d4f4011c9 by Yury Selivanov in branch '3.4': Issue #23812: Fix getter-cancellation with many pending getters code path https://hg.python.org/cpython/rev/024d4f4011c9 New changeset 2752fe734bfb by Yury Selivanov in branch '3.5': Merge 3.4 (issue #23812) https://hg.python.org/cpython/rev/2752fe734bfb New changeset e54c684e6788 by Yury Selivanov in branch 'default': Merge 3.5 (issue #23812) https://hg.python.org/cpython/rev/e54c684e6788 |
|
|
msg248138 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-06 18:19 |
Guido, Victor, I've just pushed a commit to fix a misspelled method call in queues.py (related to the previous commit in this issue). Along with fixing the bug and writing a unittest for it, I discovered an issue with the current queues design. Here's an outline of the new unittest: 1. we have a queue with three queue.get() coroutines waiting for its items: `g1`, `g2`, `g3` 2. we do `push_nowait(i1)` and `push_nowait(i2)`; we cancel `g1` 3. the loop gets the chance to run 4. `g1` detects that it was cancelled and pushes the item `i1` in front of the queue (otherwise we just lose it). Now, the problem is that `g2` has already received `i2` (by `push_nowait`); and `i1` will end up consumed by `g3`! This all means that with getter coroutines being cancelled, it's possible that getters might receive queue items out of order. So the question is: is this a matter of documenting this behaviour, or we should consider some other design for queues implementation? |
|
|
msg248144 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-08-06 20:00 |
Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item. On Thu, Aug 6, 2015 at 8:19 PM, Yury Selivanov <report@bugs.python.org> wrote: > > Yury Selivanov added the comment: > > Guido, Victor, > > I've just pushed a commit to fix a misspelled method call in queues.py > (related to the previous commit in this issue). > > Along with fixing the bug and writing a unittest for it, I discovered an > issue with the current queues design. > > Here's an outline of the new unittest: > > 1. we have a queue with three queue.get() coroutines waiting for its > items: `g1`, `g2`, `g3` > > 2. we do `push_nowait(i1)` and `push_nowait(i2)`; we cancel `g1` > > 3. the loop gets the chance to run > > 4. `g1` detects that it was cancelled and pushes the item `i1` in front of > the queue (otherwise we just lose it). > > Now, the problem is that `g2` has already received `i2` (by > `push_nowait`); and `i1` will end up consumed by `g3`! > > This all means that with getter coroutines being cancelled, it's possible > that getters might receive queue items out of order. > > So the question is: is this a matter of documenting this behaviour, or we > should consider some other design for queues implementation? > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue23812> > _______________________________________ > |
|
|
msg248145 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-08-06 20:00 |
Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item. |
|
|
msg248146 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-06 20:08 |
> Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item. I think the push-back is a right thing. At least, I wouldn't expect ``wait_for(queue.get(), 0.1)`` code to lose queue items *sometimes* in big codebases. |
|
|
msg248148 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-08-06 20:42 |
I don't think the order for multiple concurrent getters matters that much. With analogy with the threading case, if multiple threads are blocked get()ing an item from the same queue, I would not presume to expect anything about the ordering which thread receives which item. That being said, I'm sorry patch caused this problem, but losing items is an infinitely worse problem. That being said, I think the problem stems from the way queue get is designed. Currently it is something like: 1. get(): in case the queue is empty, create a Future, add it to "_getters", then begin waiting for this future; 2. When an item is being put(), take the first getter and set its result to the item being put; 3. the get() coroutine resumes and the item is found in the futures's result. A better design is to make it so the future that get() is waiting for doesn't actually receive the item, it is only used to "wake up" the get() coroutine. I would be something like: 1. get(): in case the queue is empty, create a Future, add it to "_getters", then begin waiting for this future; 2. When an item is being put(): 2.1. add the item to the internal queue 2.2. take the first getter and set its result to None 3. the get() coroutine resumes, and then takes an item from the internal queue, returning it. Clearly, in this design, handling cancellation in get is much simpler: you don't need to do anything, just let it cancel, no cleanup action needed. |
|
|
msg248150 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-06 21:38 |
> A better design is to make it so the future that get() is waiting for doesn't actually receive the item, it is only used to "wake up" the get() coroutine. I would be something like: > > 1. get(): in case the queue is empty, create a Future, add it to "_getters", then begin waiting for this future; > 2. When an item is being put(): > 2.1. add the item to the internal queue > 2.2. take the first getter and set its result to None > 3. the get() coroutine resumes, and then takes an item from the internal queue, returning it. > > Clearly, in this design, handling cancellation in get is much simpler: you don't need to do anything, just let it cancel, no cleanup action needed. I like this a lot. Do you want to draft a patch? :) |
|
|
msg248157 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-08-06 22:47 |
Sure, just give me a couple of days. |
|
|
msg248181 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-08-07 08:53 |
So this looks like it will miss 3.5.0rc1. How confident are we that the new patch won't introduce new bugs? This late in the release process that would be awkward. On Fri, Aug 7, 2015 at 12:47 AM, Gustavo J. A. M. Carneiro < report@bugs.python.org> wrote: > > Gustavo J. A. M. Carneiro added the comment: > > Sure, just give me a couple of days. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue23812> > _______________________________________ > |
|
|
msg248187 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-08-07 13:06 |
I was wrong, there still needs to be some cleanup in cancellation, even with the new approach. But it does solve the out-of-order problem. I don't know if it should be applied to rc1. I wish I had more time to test. Up to you guys. |
|
|
msg248203 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-07 16:14 |
Guido, I agree, let's not push the updated implementation in 3.5.0. Gustavo, could you please generate the patch with "hg diff", so that code review here works? And I think we need a new issue to track the new patch. |
|
|
msg248206 - (view) |
Author: Gustavo J. A. M. Carneiro (gustavo) * |
Date: 2015-08-07 16:36 |
I am not using hg anymore, since asyncio migrated to git. Here's a github PR, does that help? https://github.com/python/asyncio/pull/260 |
|
|
msg248207 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2015-08-07 16:37 |
Yep, GH works. Thanks! |
|
|