[Python-Dev] PEP 3148 ready for pronouncement (original) (raw)

Brian Quinlan brian at sweetapp.com
Mon May 24 11:36:50 CEST 2010


On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:

On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:

On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:

On May 22, 2010, at 8:47 PM, Brian Quinlan wrote: Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days. So fire away! As you wish! I retract my request ;-) May you get what you wish for, may you find what you are seeking :). The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables". Did you find the terminology confusing? If not then I propose not changing it. Yes, actually. Whenever I see references to the multiprocessing module, I picture a giant "HERE BE (serialization) DRAGONS" sign. When I saw that some things were documented as being "functions", I thought that maybe there was intended to be a restriction like the "these can only be top-level functions so they're easy for different executors to locate and serialize". I didn't realize that the intent was "arbitrary callables" until I carefully re-read the document and noticed that the terminology was inconsistent.

ProcessPoolExecutor has the same serialization perils that
multiprocessing does. My original plan was to link to the
multiprocessing docs to explain them but I couldn't find them listed.

But changing it in the user docs is probably a good idea. I like "callables" too. Great. Still, users will inevitably find the PEP and use it as documentation too.

The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly. Any response to this bit? Did I miss something in the PEP?

Yes, the execution context is Executor-dependent. The section under
ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.

More relevant to my own interests, the execution context of the callables passed to adddonecallback and removedonecallback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241_ _>, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself sort of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's not a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine).

The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract? Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.

Your concerns is that the thread will be killed when the interpreter
exits? It won't be.

This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it should be fairly trivial to write?

I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion: I did not communicate that well. As implemented, it's quite possible to implement a translation layer which turns a Future into a Deferred. What I meant by that comment was, the specification in the PEP was to loose to be sure that such a layer would work with arbitrary executors. For what it's worth, the Deferred translator would look like this, if you want to include it in the PEP (untested though, you may want to run it first): from twisted.internet.defer import Deferred from twisted.internet.reactor import callFromThread def future2deferred(future): d = Deferred() def invokedeferred(): try: result = future.result() except: d.errback() else: d.callback(result) def donecallback(samefuture): callFromThread(invokedeferred) future.adddonecallback(donecallback) return d This does beg the question of what the traceback will look like in that except: block though. I guess the multi-threaded executor will use python3 exception chaining so Deferred should be able to show a sane traceback in case of an error, but what about exceptions in other processes? I suggest having have adddonecallback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of removedonecallback. Sounds good to me! Great! :-) futures.base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map. That was a bug that I've fixed. Thanks! Double-great! One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from init or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute. No reaction on this part? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "setresult" :).

My reactions are mixed ;-)

Your proposal is to add a level of indirection to make it harder for
people to call implementation methods. The downside is that it makes
it a bit harder to write tests and Executors. I also can't see a big
problem in letting people call set_result in client code though it is
documented as being only for Executor implementations and tests.

On the implementation side, I don't see why an Invoker needs a
reference to the future. Each Invoker could own one Future. A
reference to the Invoker is kept by the Executor and its future is
returned to the client i.e.

class Invoker(object): def init(self): """Should only be called by Executor implementations.""" self.future = Future()

def set_running_or_notify_cancel(self): # Messes with self.future's internals

def set_result(self): # Messes with self.future's internals

def set_exception(self): # Messes with self.future's internals

Cheers, Brian

Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).

We've already had this discussion before. Could you explain why this module should not be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc? You've convinced me, mainly because I noticed later on in the discussion that it has been released to pypi for several months, and does have a bunch of downloads. It doesn't have quite the popularity I'd personally like to see for stdlib modules, but it's not like you didn't try, and you do (sort of) have a point about small modules being hard to get adoption. I'm sorry that this, my least interesting point in my opinion, is what has seen the most discussion so far. I'd appreciate it if you could do a release to pypi with the bugfixes you mentioned here, to make sure that the released version is consistent with what eventually gets into Python. Oh, and one final nitpick: <http://www.rfc-editor.org/rfc/_ _rfc2606.txt> says you really should not put real domain names into your "web crawl example", especially not "some-made-up-domain.com".


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100524/ec371d18/attachment-0001.html>



More information about the Python-Dev mailing list