[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension (original) (raw)

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Thomas Moreau thomas.moreau.2010 at gmail.com
Fri Oct 19 02:57:35 EDT 2018


Hello,

I have been working on the concurent.futures module lately and I think this optimization should be avoided in the context of python Pools.

This is an interesting idea, however its implementation will bring many complicated issues as it breaks the basic paradigm of a Pool: the tasks are independent and you don't know which worker is going to run which task.

The function is serialized with each task because of this paradigm. This ensure that any worker picking the task will be able to perform it independently from the tasks it has run before, given that it as been initialized correctly at the beginning. This makes it simple to run each task.

As the Pool comes with no scheduler, with your idea, you would need a synchronization step to send the function to all workers before you can launch your task. But if there is already one worker performing a long running task, does the Pool wait for it to be done before it sends the function? If the Pool doesn't wait, how does it ensure that this worker will be able to get the definition of the function before running it? Also, the multiprocessing.Pool has some features where a worker can shut itself down after a given number of tasks or a timeout. How does it ensure that the new worker will have the definition of the function? It is unsafe to try such a feature (sending only once an object) anywhere else than in the initializer which is guaranteed to be run once per worker.

On the other hand, you mentioned an interesting point being that making globals available in the workers could be made simpler. A possible solution would be to add a "globals" argument in the Pool which would instanciate global variables in the workers. I have no specific idea but on the implementation of such features but it would be safer as it would be an initialization feature.

Regards, Thomas Moreau

On Thu, Oct 18, 2018, 22:20 Chris Jerdonek <chris.jerdonek at gmail.com> wrote:

On Thu, Oct 18, 2018 at 9:11 AM Michael Selik <michael.selik at gmail.com> wrote: > On Thu, Oct 18, 2018 at 8:35 AM Sean Harrington <seanharr11 at gmail.com> wrote: >> Further, let me pivot on my idea of qualname...we can use the id of func as the cache key to address your concern, and store this id on the task tuple (i.e. an integer in-lieu of the func previously stored there). > > > Possible. Does the Pool keep a reference to the passed function in the main process? If not, couldn't the garbage collector free that memory location and a new function could replace it? Then it could have the same qualname and id in CPython. Edge case, for sure. Worse, it'd be hard to reproduce as it'd be dependent on the vagaries of memory allocation.

I'm not following this thread closely, but I just wanted to point out that qualname won't necessarily be an attribute of the object if the API accepts any callable. (I happen to be following an issue on the tracker where this came up.) --Chris


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/thomas.moreau.2010%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20181019/ef763de8/attachment-0001.html>



More information about the Python-Dev mailing list