[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension (original) (raw)

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Antoine Pitrou solipsis at pitrou.net
Sat Sep 29 06:04:47 EDT 2018


Hi Sean,

On Fri, 28 Sep 2018 19:23:06 -0400 Sean Harrington <seanharr11 at gmail.com> wrote:

My simple argument is that the developer should not be constrained to make the objects passed globally available in the process, as this MAY break encapsulation for large projects.

IMHO, global variables don't break encapsulation if they remain private to the module where they actually play a role.

Of course, there are also global-like alternatives to globals, such as class attributes... The multiprocessing module itself uses globals (or quasi-globals) internally for various implementation details.

3. If you don't like globals, you could probably do something like > lazily-initialize the resource when a function needing it is executed; > this also avoids creating the resource if the child doesn't use it at > all. Would that work for you? > > I have nothing against globals, my gripe is with being enforced to use them for every Pool use case. Further, if initializing the resource is expensive, we only want to do this ONE time per worker process.

That's what I meant with lazy initialization: initialize it if not already done, otherwise just use the already-initialized resource. It's a common pattern.

(you can view it as a 1-element cache if you prefer)

> As a more general remark, I understand the desire to make the Pool > object more flexible, but we can also not pile up features until it > satisfies all use cases. > > I understand that this is a legitimate concern, but this is about API approachability. Python end-users of Pool are forced to declare a global from a lexical scope. Most Python end-users probably don't even know this is possible.

Hmm... We might have a disagreement on the target audience of the multiprocessing module. multiprocessing isn't very high-level, I would expect it to be used by experienced programmers who know how to mutate a global variable from a lexical scope.

For non-programmer end-users, such as data scientists, there are higher-level libraries such as Celery (http://www.celeryproject.org/) and Dask distributed (https://distributed.readthedocs.io/en/latest/). Perhaps it would be worth mentioning them in the documentation.

Regards

Antoine.



More information about the Python-Dev mailing list