[Python-ideas] Tulip / PEP 3156 (original) (raw)

[Python-ideas] Tulip / PEP 3156 - subprocess events

Guido van Rossum guido at python.org
Fri Jan 18 23:15:07 CET 2013


On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

Paul Moore wrote:

PS From the PEP, it seems that a protocol must implement the 4 methods connectionmade, datareceived, eofreceived and connectionlost. For a process, which has 2 output streams involved, a single datareceived method isn't enough.

It looks like there would have to be at least two Transport instances involved, one for stdin/stdout and one for stderr.

Connecting them both to a single Protocol object doesn't seem to be possible with the framework as defined. You would have to use a couple of adapter objects to translate the datareceived calls into calls on different methods of another object.

So far this makes sense.

But for this specific case there's a simpler solution -- require the protocol to support a few extra methods, in particular, err_data_received() and err_eof_received(), which are to stderr what data_received() and eof_received() are for stdout. (After all, the point of a subprocess is that "normal" data goes to stdout.) There's only one input stream to the subprocess, so there's no ambiguity for write(), and neither is there a need for multiple connection_made()/lost() methods. (However, we could argue endlessly over whether connection_lost() should be called when the subprocess exits, or when the other side of all three pipes is closed. :-)

This sort of thing would be easier if, instead of the Transport calling a predefined method of the Protocol, the Protocol installed a callback into the Transport. Then a Protocol designed for dealing with subprocesses could hook different methods of itself into a pair of Transports.

Hm. Not excited. I like everyone using the same names for these callback methods, so that a reader (who is familiar with the transport/protocol API) can instantly know what kind of callback it is and what its arguments are. (But see Nick's simple solution for having your cake and eating it, too.)

Stepping back a bit, I must say that from the coroutine viewpoint, the Protocol/Transport stuff just seems to get in the way. If I were writing coroutine-based code to deal with a subprocess, I would want to be able to write coroutines like

def handleoutput(stdout): while 1: line = yield from stdout.readline() if not line: break mungulateline(line) def handleerrors(stderr): while 1: line = yield from stderr.readline() if not line: break complaintouser(line) In other words, I don't want Transports or Protocols or any of that cruft, I just want a simple pair of async stream objects that I can read and write using yield-from calls. There doesn't seem to be anything like that specified in PEP 3156.

This is a good observation -- one that I've made myself as well. I also have a plan for dealing with it -- but I haven't coded it up properly yet and consequently I haven't written it up for the PEP yet either.

The idea is that there will be some even-higher-level functions for tasks to call to open connections (etc.) which just give you two unidrectional streams (one for reading, one for writing). The write-stream can just be the transport (its write() and writelines() methods are familiar from regular I/O streams) and the read-stream can be a StreamReader -- a class I've written but which needs to be moved into a better place: http://code.google.com/p/tulip/source/browse/tulip/http_client.py#37

Anyway, the reason for having the transport/protocol abstractions in the middle is so that other frameworks can ignore coroutines if they want to -- all they have to do is work with Futures, which can be fully controlled through callbacks (which are native at the lowest level of almost all frameworks, including Tulip / PEP 3156).

It does mention something about implementing a streaming buffer on top of a Transport, but in a way that makes it sound like a suggested recipe rather than something to be provided by the library. Also it seems like a lot of layers of overhead to go through.

It'll be in the stdlib, no worries. I don't expect the overhead to be a problem.

On the whole, in PEP 3156 the idea of providing callback-based interfaces with yield-from-based ones built on top has been pushed way further up the stack than I imagined it would. I don't want to be forced to write my coroutine code at the level of Protocols; I want to be able to work at a lower level than that.

You can write an alternative framework using coroutines and callbacks, bypassing transports and protocols. (You'll still need Futures.) However you'd be missing the interoperability offered by the protocol/transport abstractions: in an IOCP world you'd have to interact with the event loop's callbacks differently than in a select/poll/etc. world.

PEP 3156 is trying to make different groups happy: people who like callbacks, people who like coroutines; people who like UNIX, people who like Windows. Everybody may have to compromise a little bit, but the reward will (hopefully) be better portability and better interoperability.

-- --Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list