[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension (original) (raw)

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

Sean Harrington seanharr11 at gmail.com
Fri Sep 28 19:23:06 EDT 2018


Hi Antoine - see inline below for my response...thanks for your time!

On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou <solipsis at pitrou.net> wrote:

Hi, On Fri, 28 Sep 2018 17:07:33 -0400 Sean Harrington <seanharr11 at gmail.com> wrote: > > In short, the implementation of the feature works as follows: > > 1. Exposes a kwarg on Pool.init called expectinitret, that > defaults to False. When set to True: > 1. Capture the return value of the initializer kwarg of Pool > 2. Pass this value to the function being applied, as a kwarg. > > Again, in short, the motivation of the feature is to provide an explicit > "flow of data" from parent process to worker process, and to avoid being > forced to using the global keyword in initializer, or being forced to > create global variables in the parent process. Thanks for taking the time to explain your use case and write a proposal. My reactions to this are: 1. The proposed API is ugly. This basically allows you to pass an argument which changes with which arguments another function is later called...

Yes I agree that this is a not-perfect contract, but isn't this also a concern with the current implementation? And isn't this pattern arguably more explicit than "The function-being-applied relying on the initializer to create a global variable from within it's lexical scope"?

  1. A global variable seems like the adequate way to represent a

    process-global object (which is exactly your use case)

There is nothing wrong with using a global variable, especially in nearly every toy example found on the internet of using multiprocessing.Pool (i.e. optimizing a simple script). But what happens when you have lots of nested function calls in your applied function? My simple argument is that the developer should not be constrained to make the objects passed globally available in the process, as this MAY break encapsulation for large projects.

  1. If you don't like globals, you could probably do something like

    lazily-initialize the resource when a function needing it is executed; this also avoids creating the resource if the child doesn't use it at all. Would that work for you?

    I have nothing against globals, my gripe is with being enforced to use them for every Pool use case. Further, if initializing the resource is expensive, we only want to do this ONE time per worker process. So no, this will not always work.

As a more general remark, I understand the desire to make the Pool object more flexible, but we can also not pile up features until it satisfies all use cases.

I understand that this is a legitimate concern, but this is about API approachability. Python end-users of Pool are forced to declare a global from a lexical scope. Most Python end-users probably don't even know this is possible. Sure, this is adding a feature for a use case that I outlined, but really this is one of the two major use cases of "initializer" and "initargs" (see my blog post for the 2 generalized use cases <https://thelaziestprogrammer.com/python/multiprocessing-pool-expect-initret-proposal>), not some obscure use case. This is making that very common use case more approachable.

As another general remark, concurrent.futures is IMHO the preferred API for the future, and where feature work should probably concentrate.

This is good to hear and know. And will keep this mind moving forward!

Regards

Antoine.


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20180928/91aefe08/attachment-0001.html>



More information about the Python-Dev mailing list