[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension (original) (raw)
[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
Sean Harrington seanharr11 at gmail.com
Fri Oct 12 09:42:50 EDT 2018
- Previous message (by thread): [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
- Next message (by thread): [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I would contend that this is much more granular than Dask - this is just an
optimization of Pool.map() to avoid redundantly passing the same func
repeatedly, once per task, to each worker, with the primary goal of
eliminating redundant serialization of large-memory-footprinted Callables.
This is a different use case than Dask - I don't intend to approach the
shared memory or distributed computing realms.
And the second call to Pool.map would update the cached "self" as a part of its initialization workflow, s.t. "the latest version of self when map() is called is taken into account".
Do you see a difficulty in accomplishing the second behavior?
On Fri, Oct 12, 2018 at 9:25 AM Antoine Pitrou <antoine at python.org> wrote:
Le 12/10/2018 à 15:17, Sean Harrington a écrit : > The implementation details need to be flushed out, but agnostic of > these, do you believe this a valid solution to the initial problem? Do > you also see it as a beneficial optimization to Pool, given that we > don't need to store funcs/bound-methods/partials on the tasks themselves? I'm not sure, TBH. I also think it may be better to leave this to higher levels (for example Dask will intelligently distribute data on workers and let you work with a kind of proxy object in the main process, transfering data only when necessary). > The latter concern about "what happens if
self
changed value in the > parent" is the same concern as "what happens iffunc
changes in the > parent?" given the current implementation. This is an assumption that is > currently made with Pool.mapasync(func, ls). If "func" changes in the > parent, there is no communication with the child. So one just needs to > be aware that calling "mapasync(self.func, ls)" while the state of > "self" is changing, will not communicate changes to each worker. The > state is frozen when Pool.map is called, just as is the case now. If you cache "self" between pool.map calls, then the question is not "what happens if self changes during a map() call" but "what happens if self changes between two map() calls"? While the former is intuitively undefined, current users would expect the latter to have a clear answer, which is: the latest version of self when map() is called is taken into account. Regards Antoine.
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20181012/c1c83eb7/attachment-0001.html>
- Previous message (by thread): [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
- Next message (by thread): [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]