msg77806 - (view) |
Author: Brian (merrellb) |
Date: 2008-12-14 16:48 |
Despite carefully matching my get() and task_done() statements I would often trigger "raise ValueError('task_done() called too many times')" in my multiprocessing.JoinableQueue (multiprocessing/queues.py) Looking over the code (and a lot of debug logging), it appears that the issue arises from JoinableQueue.put() not being protected with a locking mechanism. A preemption after the first line allows other processes to resume without releasing the _unfinished_tasks semaphore. The simplest solution seems to be allowing task_done() to block while waiting to acquire the _unfinished_tasks semaphore. Replacing: if not self._unfinished_tasks.acquire(False): raise ValueError('task_done() called too many times') With simply: self._unfinished_tasks.acquire() This would however remove the error checking provided (given the many far more subtler error that can be made, I might argue it is of limited value). Alternately the JoinableQueue.put() method could be better protected. |
|
|
msg78226 - (view) |
Author: Brian (merrellb) |
Date: 2008-12-23 06:40 |
Here are a few stabs at how this might be addressed. 1) As originally suggested. Allow task_done() to block waiting to acquire _unfinished_tasks. This will allow the put() process to resume, release() _unfinished_tasks at which point task_done() will unblock. No harm, no foul but you do lose some error checking (and maybe some performance?) 2) One can't protect JoinableQueue.put() by simply acquiring _cond before calling Queue.put(). Fixed size queues will block if the queue is full, causing deadlock when task_done() can't acquire _cond. The most obvious solution would seem to be reimplementing JoinableQueue.put() (not simply calling Queue.put()) and then inserting self._unfinished_tasks.release() into a protected portion. Perhaps: def put(self, obj, block=True, timeout=None): assert not self._closed if not self._sem.acquire(block, timeout): raise Full self._notempty.acquire() self._cond.acquire() try: if self._thread is None: self._start_thread() self._buffer.append(obj) self._unfinished_tasks.release() self._notempty.notify() finally: self._cond.release() self._notempty.release() We may be able to get away with not acquiring _cond as _notempty would provide some protection. However its relationship to get() isn't entirely clear to me so I am not sure if this would be sufficient. |
|
|
msg84613 - (view) |
Author: Jesse Noller (jnoller) *  |
Date: 2009-03-30 18:55 |
Hi Brian - do you have a chunk of code that exacerbates this? I'm having problems reproducing this, and need a test so I can prove out the fix. |
|
|
msg84639 - (view) |
Author: Brian (merrellb) |
Date: 2009-03-30 20:25 |
Hey Jesse, It was good meeting you at Pycon. I don't have anything handy at the moment although, if memory serves, the most trivial of example seemed to illustrate the problem. Basically any situation where a joinable queue would keep bumping up against being empty (ie retiring items faster than they are being fed), and does enough work between get() and task_done() to be preempted would eventually break. FWIW I was running on a Windows box. I am afraid I am away from my computer until late tonight but I can try to cook something up then (I presume you are sprinting today?). Also I think the issue becomes clear when you think about what happens if joinablequeue.task_done() gets preempted between its few lines. -brian On Mon, Mar 30, 2009 at 2:55 PM, Jesse Noller <report@bugs.python.org>wrote: > > Jesse Noller <jnoller@gmail.com> added the comment: > > Hi Brian - do you have a chunk of code that exacerbates this? I'm having > problems reproducing this, and need a test so I can prove out the fix. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4660> > _______________________________________ > |
|
|
msg86044 - (view) |
Author: Brian (merrellb) |
Date: 2009-04-16 21:16 |
Jesse, I am afraid my last post may have confused the issue. As I mentioned in my first post, the problem arises when JoinableQueue.put is preempted between its two lines. Perhaps the easiest way to illustrate this is to exacerbate it by modifying JoinableQueue.put to force a preemption at this inopportune time. import time def put(self, item, block=True, timeout=None): Queue.put(self, item, block, timeout) time.sleep(1) self._unfinished_tasks.release() Almost any example will now fail. from multiprocessing import JoinableQueue, Process def printer(in_queue): while True: print in_queue.get() in_queue.task_done() if __name__ == '__main__': jqueue = JoinableQueue() a = Process(target = printer, args=(jqueue,)).start() jqueue.put("blah") |
|
|
msg89735 - (view) |
Author: Filipe Fernandes (ffernand) |
Date: 2009-06-26 18:53 |
I ran into the same problem and am greatful to Brian for reporting this as I thought I was loosing my mind. Brian noted that he was running windows and I can confirm that Brian's test case is reproducable on my laptop running: Ubuntu 9.04 python 2.6.2 Although I'm reluctant to try Brian's suggestions without additional comments even if they do work. I'll be using this in production. |
|
|
msg89812 - (view) |
Author: Brian (merrellb) |
Date: 2009-06-29 05:55 |
Filipe, Thanks for the confirmation. While I think the second option (ie properly protecting JoinableQueue.put()) is best, the first option (simply removing the 'task_done() called too many times' check) should be safe (presuming your get() and put() calls actually match). Jesse, any luck sorting out the best fix for this? I really think that JoinableQueue (in my opinion the most useful form of multiprocessing queues) can't be guaranteed to work on any system right now. -brian |
|
|
msg89828 - (view) |
Author: Jesse Noller (jnoller) *  |
Date: 2009-06-29 11:46 |
I'm leaning towards the properly protecting JoinableQueue.put() fix, I'm not a terribly big fan of removing error checking. I'm trying to carve off time this week to beat on my bug queue, so I'm hoping to be able to commit something (once I have docs+tests) this week. |
|
|
msg90283 - (view) |
Author: Brian (merrellb) |
Date: 2009-07-08 21:31 |
Cool., let me know if there is anything I can do to help. On Mon, Jun 29, 2009 at 7:46 AM, Jesse Noller <report@bugs.python.org>wrote: > > Jesse Noller <jnoller@gmail.com> added the comment: > > I'm leaning towards the properly protecting JoinableQueue.put() fix, I'm > not a terribly big fan of removing error checking. I'm trying to carve off > time this week to beat on my bug queue, so I'm hoping to be able to commit > something (once I have docs+tests) this week. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4660> > _______________________________________ > |
|
|
msg91346 - (view) |
Author: Jesse Noller (jnoller) *  |
Date: 2009-08-06 02:08 |
Fix checked into python trunk with r74326, 26 maint w/ r74327 |
|
|
msg91347 - (view) |
Author: Jesse Noller (jnoller) *  |
Date: 2009-08-06 02:10 |
I used the protected JoinableQueue put method suggested by Brian. |
|
|