[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension (original) (raw)

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Nathaniel Smith njs at pobox.com
Fri Oct 12 15:34:17 EDT 2018


On Fri, Oct 12, 2018, 06:09 Antoine Pitrou <solipsis at pitrou.net> wrote:

On Fri, 12 Oct 2018 08:33:32 -0400 Sean Harrington <seanharr11 at gmail.com> wrote: > Hi Nathaniel - this if this solution can be made performant, than I would > be more than satisfied. > > I think this would require removing "func" from the "task tuple", and > storing the "func" "once per worker" somewhere globally (maybe a class > attribute set post-fork?). > > This also has the beneficial outcome of increasing general performance of > Pool.map and friends. I've seen MANY folks across the interwebs doing > things like passing instance methods to map, resulting in "big" tasks, and > slower-than-sequential parallelized code. Parallelizing "instance methods" > by passing them to map, w/o needing to wrangle with staticmethods and > globals, would be a GREAT feature! It'd just be as easy as: > > Pool.map(self.func, ls) > > What do you think about this idea? This is something I'd be able to take > on, assuming I get a few core dev blessings...

Well, I'm not sure how it would work, so it's difficult to give an opinion. How do you plan to avoid passing "self"? By caching (by equality? by identity?)? Something else? But what happens if "self" changed value (in the case of a mutable object) in the parent? Do you keep using the stale version in the child? That would break compatibility...

I was just suggesting that within a single call to Pool.map, it would be reasonable optimization to only send the fn once to each worker. So e.g. if you have 5 workers and 1000 items, you'd only pickle fn 5 times, rather than 1000 times like we do now. I wouldn't want to get any fancier than that with caching data between different map calls or anything.

Of course even this may turn out to be too complicated to implement in a reasonable way, since it would require managing some extra state on the workers. But semantically it would be purely an optimization of current semantics.

-n

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20181012/814918c5/attachment.html>



More information about the Python-Dev mailing list