[Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted (original) (raw)

Guido van Rossum guido at python.org
Fri Dec 21 19:57:12 CET 2012

Previous message: [Python-Dev] What is the "sequence"? ([issue16728] collections.abc.Sequence shoud provide __subclasshook__
Next message: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear python-dev and python-ideas,

I am posting PEP 3156 here for early review and discussion. As you can see from the liberally sprinkled TBD entries it is not done, but I am about to disappear on vacation for a few weeks and I am reasonably happy with the state of things so far. (Of course feedback may change this. :-) Also, there has already been some discussion on python-ideas (and even on Twitter) so I don't want python-dev to feel out of the loop -- this is a proposal for a new standard library module. (But no, I haven't picked the module name yet. :-)

There's an -- also incomplete -- reference implementation at http://code.google.com/p/tulip/ -- unlike the first version of tulip, this version actually has (some) unittests.

Let the bikeshedding begin!

(Oh, happy holidays too. :-)

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- PEP: 3156 Title: Asynchronous IO Support Rebooted Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Guido van Rossum <guido at python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Dec-2012 Post-History: TBD

Abstract

This is a proposal for asynchronous I/O in Python 3, starting with Python 3.3. Consider this the concrete proposal that is missing from PEP 3153. The proposal includes a pluggable event loop API, transport and protocol abstractions similar to those in Twisted, and a higher-level scheduler based on yield from (PEP 380). A reference implementation is in the works under the code name tulip.

Introduction

The event loop is the place where most interoperability occurs. It should be easy for (Python 3.3 ports of) frameworks like Twisted, Tornado, or ZeroMQ to either adapt the default event loop implementation to their needs using a lightweight wrapper or proxy, or to replace the default event loop implementation with an adaptation of their own event loop implementation. (Some frameworks, like Twisted, have multiple event loop implementations. This should not be a problem since these all have the same interface.)

It should even be possible for two different third-party frameworks to interoperate, either by sharing the default event loop implementation (each using its own adapter), or by sharing the event loop implementation of either framework. In the latter case two levels of adaptation would occur (from framework A's event loop to the standard event loop interface, and from there to framework B's event loop). Which event loop implementation is used should be under control of the main program (though a default policy for event loop selection is provided).

Thus, two separate APIs are defined:

getting and setting the current event loop object
the interface of a conforming event loop and its minimum guarantees

An event loop implementation may provide additional methods and guarantees.

The event loop interface does not depend on yield from. Rather, it uses a combination of callbacks, additional interfaces (transports and protocols), and Futures. The latter are similar to those defined in PEP 3148, but have a different implementation and are not tied to threads. In particular, they have no wait() method; the user is expected to use callbacks.

For users (like myself) who don't like using callbacks, a scheduler is provided for writing asynchronous I/O code as coroutines using the PEP 380 yield from expressions. The scheduler is not pluggable; pluggability occurs at the event loop level, and the scheduler should work with any conforming event loop implementation.

For interoperability between code written using coroutines and other async frameworks, the scheduler has a Task class that behaves like a Future. A framework that interoperates at the event loop level can wait for a Future to complete by adding a callback to the Future. Likewise, the scheduler offers an operation to suspend a coroutine until a callback is called.

Limited interoperability with threads is provided by the event loop interface; there is an API to submit a function to an executor (see PEP 3148) which returns a Future that is compatible with the event loop.

Non-goals

Interoperability with systems like Stackless Python or greenlets/gevent is not a goal of this PEP.

Specification

Dependencies

Python 3.3 is required. No new language or standard library features beyond Python 3.3 are required. No third-party modules or packages are required.

Module Namespace

The specification here will live in a new toplevel package. Different components will live in separate submodules of that package. The package will import common APIs from their respective submodules and make them available as package attributes (similar to the way the email package works).

The name of the toplevel package is currently unspecified. The reference implementation uses the name 'tulip', but the name will change to something more boring if and when the implementation is moved into the standard library (hopefully for Python 3.4).

Until the boring name is chosen, this PEP will use 'tulip' as the toplevel package name. Classes and functions given without a module name are assumed to be accessed via the toplevel package.

Event Loop Policy: Getting and Setting the Event Loop

To get the current event loop, use get_event_loop(). This returns an instance of the EventLoop class defined below or an equivalent object. It is possible that get_event_loop() returns a different object depending on the current thread, or depending on some other notion of context.

To set the current event loop, use set_event_loop(event_loop), where event_loop is an instance of the EventLoop class or equivalent. This uses the same notion of context as get_event_loop().

For the benefit of unit tests and other special cases there's a third policy function: init_event_loop(), which creates a new EventLoop instance and calls set_event_loop() with it. TBD: Maybe we should have a create_default_event_loop_instance() function instead?

To change the way the above three functions work (including their notion of context), call set_event_loop_policy(policy), where policy is an event loop policy object. The policy object can be any object that has methods get_event_loop(), set_event_loop(event_loop) and init_event_loop() behaving like the functions described above. The default event loop policy is an instance of the class DefaultEventLoopPolicy. The current event loop policy object can be retrieved by calling get_event_loop_policy().

An event loop policy may but does not have to enforce that there is only one event loop in existence. The default event loop policy does not enforce this, but it does enforce that there is only one event loop per thread.

Event Loop Interface

(A note about times: as usual in Python, all timeouts, intervals and delays are measured in seconds, and may be ints or floats. The accuracy and precision of the clock are up to the implementation; the default implementation uses time.monotonic().)

A conforming event loop object has the following methods:

run(). Runs the event loop until there is nothing left to do. This means, in particular:
- No more calls scheduled with call_later(), call_repeatedly(), call_soon(), or call_soon_threadsafe(), except for cancelled calls.
- No more registered file descriptors. It is up to the registering party to unregister a file descriptor when it is closed.
Note: run() blocks until the termination condition is met, or until stop() is called.

Note: if you schedule a call with call_repeatedly(), run() will not exit until you cancel it.

TBD: How many variants of this do we really need?
stop(). Stops the event loop as soon as it is convenient. It is fine to restart the loop with run() (or one of its variants) subsequently.

Note: How soon exactly is up to the implementation. All immediate callbacks that were already scheduled to run before stop() is called must still be run, but callbacks scheduled after it is called (or scheduled to be run later) will not be run.
run_forever(). Runs the event loop until stop() is called.
run_until_complete(future, timeout=None). Runs the event loop until the Future is done. If a timeout is given, it waits at most that long. If the Future is done, its result is returned, or its exception is raised; if the timeout expires before the Future is done, or if stop() is called, TimeoutError is raised (but the Future is not cancelled). This cannot be called when the event loop is already running.

Note: This API is most useful for tests and the like. It should not be used as a substitute for yield from future or other ways to wait for a Future (e.g. registering a done callback).
run_once(timeout=None). Run the event loop for a little while. If a timeout is given, an I/O poll made will block at most that long; otherwise, an I/O poll is not constrained in time.

Note: Exactlly how much work this does is up to the implementation. One constraint: if a callback immediately schedules itself using call_soon(), causing an infinite loop, run_once() should still return.
call_later(delay, callback, *args). Arrange for callback(*args) to be called approximately delay seconds in the future, once, unless cancelled. Returns a Handler object representing the callback, whose cancel() method can be used to cancel the callback.
call_repeatedly(interval, callback, **args). Like call_later() but calls the callback repeatedly, every interval seconds, until the Handler returned is cancelled. The first call is in interval seconds.
call_soon(callback, *args). Equivalent to call_later(0, callback, *args).
call_soon_threadsafe(callback, *args). Like call_soon(callback, *args), but when called from another thread while the event loop is blocked waiting for I/O, unblocks the event loop. This is the only method that is safe to call from another thread or from a signal handler. (To schedule a callback for a later time in a threadsafe manner, you can use ev.call_soon_threadsafe(ev.call_later, when, callback, *args).)
TBD: A way to register a callback that is already wrapped in a Handler. Maybe call_soon() could just check isinstance(callback, Handler)? It should silently skip a cancelled callback.

Some methods in the standard conforming interface return Futures:

wrap_future(future). This takes a PEP 3148 Future (i.e., an instance of concurrent.futures.Future) and returns a Future compatible with the event loop (i.e., a tulip.Future instance).
run_in_executor(executor, function, *args). Arrange to call function(*args) in an executor (see PEP 3148). Returns a Future whose result on success is the return value that call. This is equivalent to wrap_future(executor.submit(function, *args)). If executor is None, a default ThreadPoolExecutor with 5 threads is used. (TBD: Should the default executor be shared between different event loops? Should we even have a default executor? Should be be able to set its thread count? Shoul we even have this method?)
set_default_executor(executor). Set the default executor used by run_in_executor().
getaddrinfo(host, port, family=0, type=0, proto=0, flags=0). Similar to the socket.getaddrinfo() function but returns a Future. The Future's result on success will be a list of the same format as returned by socket.getaddrinfo(). The default implementation calls socket.getaddrinfo() using run_in_executor(), but other implementations may choose to implement their own DNS lookup.
getnameinfo(sockaddr, flags=0). Similar to socket.getnameinfo() but returns a Future. The Future's result on success will be a tuple (host, port). Same implementation remarks as for getaddrinfo().
create_transport(protocol_factory, host, port, **kwargs). Creates a transport and a protocol and ties them together. Returns a Future whose result on success is a (transport, protocol) pair. Note that when the Future completes, the protocol's connection_made() method has not yet been called; that will happen when the connection handshake is complete. When it is impossible to connect to the given host and port, the Future will raise an exception instead.

Optional keyword arguments:
- family, type, proto, flags: Address familty, socket type, protcol, and miscellaneous flags to be passed through to getaddrinfo(). These all default to 0 except type which defaults to socket.SOCK_STREAM.
- ssl: Pass True to create an SSL transport (by default a plain TCP is created). Or pass an ssl.SSLContext object to override the default SSL context object to be used.
TBD: Should this be called create_connection()?
start_serving(...). Enters a loop that accepts connections. TBD: Signature. There are two possibilities:
1. You pass it a non-blocking socket that you have already prepared with bind() and listen() (these system calls do not block AFAIK), a protocol factory (I hesitate to use this word :-), and optional flags that control the transport creation (e.g. ssl).
2. Instead of a socket, you pass it a host and port, and some more optional flags (e.g. to control IPv4 vs IPv6, or to set the backlog value to be passed to listen()).
In either case, once it has a socket, it will wrap it in a transport, and then enter a loop accepting connections (the best way to implement such a loop depends on the platform). Each time a connection is accepted, a transport and protocol are created for it.

This should return an object that can be used to control the serving loop, e.g. to stop serving, abort all active connections, and (if supported) adjust the backlog or other parameters. It may also have an API to inquire about active connections. If version (2) is selected, it should probably return a Future whose result on success will be that control object, and which becomes done once the accept loop is started.

TBD: It may be best to use version (2), since on some platforms the best way to start a server may not involve sockets (but will still involve transports and protocols).

TBD: Be more specific.

TBD: Some platforms may not be interested in implementing all of these, e.g. start_serving() may be of no interest to mobile apps. (Although, there's a Minecraft server on my iPad...)

The following methods for registering callbacks for file descriptors are optional. If they are not implemented, accessing the method (without calling it) returns AttributeError. The default implementation provides them but the user normally doesn't use these directly -- they are used by the transport implementations exclusively. Also, on Windows these may be present or not depending on whether a select-based or IOCP-based event loop is used. These take integer file descriptors only, not objects with a fileno() method. The file descriptor should represent something pollable -- i.e. no disk files.

add_reader(fd, callback, *args). Arrange for callback(*args) to be called whenever file descriptor fd is ready for reading. Returns a Handler object which can be used to cancel the callback. Note that, unlike call_later(), the callback may be called many times. Calling add_reader() again for the same file descriptor implicitly cancels the previous callback for that file descriptor. (TBD: Returning a Handler that can be cancelled seems awkward. Let's forget about that.) (TBD: Change this to raise an exception if a handler is already set.)
add_writer(fd, callback, *args). Like add_reader(), but registers the callback for writing instead of for reading.
remove_reader(fd). Cancels the current read callback for file descriptor fd, if one is set. A no-op if no callback is currently set for the file descriptor. (The reason for providing this alternate interface is that it is often more convenient to remember the file descriptor than to remember the Handler object.) (TBD: Return True if a handler was removed, False if not.)
remove_writer(fd). This is to add_writer() as remove_reader() is to add_reader().
add_connector(fd, callback, *args). Like add_writer() but meant to wait for connect() operations, which on some platforms require different handling (e.g. WSAPoll() on Windows).
remove_connector(fd). This is to remove_writer() as add_connector() is to add_writer().

TBD: What about multiple callbacks per fd? The current semantics is that add_reader()/add_writer() replace a previously registered callback. Change this to raise an exception if a callback is already registered.

The following methods for doing async I/O on sockets are optional. They are alternative to the previous set of optional methods, intended for transport implementations on Windows using IOCP (if the event loop supports it). The socket argument has to be a non-blocking socket.

sock_recv(sock, n). Receive up to n bytes from socket sock. Returns a Future whose result on success will be a bytes object on success.
sock_sendall(sock, data). Send bytes data to the socket sock. Returns a Future whose result on success will be None. (TBD: Is it better to emulate sendall() or send() semantics? I think sendall() -- but perhaps it should still be named send()?)
sock_connect(sock, address). Connect to the given address. Returns a Future whose result on success will be None.
sock_accept(sock). Accept a connection from a socket. The socket must be in listening mode and bound to an address. Returns a Future whose result on success will be a tuple (conn, peer) where conn is a connected non-blocking socket and peer is the peer address. (TBD: People tell me that this style of API is too slow for high-volume servers. So there's also start_serving() above. Then do we still need this?)

TBD: Optional methods are not so good. Perhaps these should be required? It may still depend on the platform which set is more efficient.

Callback Sequencing

When two callbacks are scheduled for the same time, they are run in the order in which they are registered. For example::

ev.call_soon(foo) ev.call_soon(bar)

guarantees that foo() is called before bar().

If call_soon() is used, this guarantee is true even if the system clock were to run backwards. This is also the case for call_later(0, callback, *args). However, if call_later() is used with a nonzero delay, all bets are off if the system clock were to runs backwards. (A good event loop implementation should use time.monotonic() to avoid problems when the clock runs backward. See PEP 418.)

Context

All event loops have a notion of context. For the default event loop implementation, the context is a thread. An event loop implementation should run all callbacks in the same context. An event loop implementation should run only one callback at a time, so callbacks can assume automatic mutual exclusion with other callbacks scheduled in the same event loop.

Exceptions

There are two categories of exceptions in Python: those that derive from the Exception class and those that derive from BaseException. Exceptions deriving from Exception will generally be caught and handled appropriately; for example, they will be passed through by Futures, and they will be logged and ignored when they occur in a callback.

However, exceptions deriving only from BaseException are never caught, and will usually cause the program to terminate with a traceback. (Examples of this category include KeyboardInterrupt and SystemExit; it is usually unwise to treat these the same as most other exceptions.)

The Handler Class

The various methods for registering callbacks (e.g. call_later()) all return an object representing the registration that can be used to cancel the callback. For want of a better name this object is called a Handler, although the user never needs to instantiate instances of this class. There is one public method:

cancel(). Attempt to cancel the callback. TBD: Exact specification.

Read-only public attributes:

callback. The callback function to be called.
args. The argument tuple with which to call the callback function.
cancelled. True if cancel() has been called.

Note that some callbacks (e.g. those registered with call_later()) are meant to be called only once. Others (e.g. those registered with add_reader()) are meant to be called multiple times.

TBD: An API to call the callback (encapsulating the exception handling necessary)? Should it record how many times it has been called? Maybe this API should just be __call__()? (But it should suppress exceptions.)

TBD: Public attribute recording the realtime value when the callback is scheduled? (Since this is needed anyway for storing it in a heap.)

Futures

The tulip.Future class here is intentionally similar to the concurrent.futures.Future class specified by PEP 3148, but there are slight differences. The supported public API is as follows, indicating the differences with PEP 3148:

cancel(). TBD: Exact specification.
cancelled().
running(). Note that the meaning of this method is essentially "cannot be cancelled and isn't done yet". (TBD: Would be nice if this could be set and cleared in some cases, e.g. sock_recv().)
done().
result(). Difference with PEP 3148: This has no timeout argument and does not wait; if the future is not yet done, it raises an exception.
exception(). Difference with PEP 3148: This has no timeout argument and does not wait; if the future is not yet done, it raises an exception.
add_done_callback(fn). Difference with PEP 3148: The callback is never called immediately, and always in the context of the caller. (Typically, a context is a thread.) You can think of this as calling the callback through call_soon_threadsafe(). Note that the callback (unlike all other callbacks defined in this PEP, and ignoring the convention from the section "Callback Style" below) is always called with a single argument, the Future object.

The internal methods defined in PEP 3148 are not supported. (TBD: Maybe we do need to support these, in order to make it easy to write user code that returns a Future?)

A tulip.Future object is not acceptable to the wait() and as_completed() functions in the concurrent.futures package.

A tulip.Future object is acceptable to a yield from expression when used in a coroutine. This is implemented through the __iter__() interface on the Future. See the section "Coroutines and the Scheduler" below.

When a Future is garbage-collected, if it has an associated exception but neither result() nor exception() nor __iter__() has ever been called (or the latter hasn't raised the exception yet -- details TBD), the exception should be logged. TBD: At what level?

In the future (pun intended) we may unify tulip.Future and concurrent.futures.Future, e.g. by adding an __iter__() method to the latter that works with yield from. To prevent accidentally blocking the event loop by calling e.g. result() on a Future that's not don yet, the blocking operation may detect that an event loop is active in the current thread and raise an exception instead. However the current PEP strives to have no dependencies beyond Python 3.3, so changes to concurrent.futures.Future are off the table for now.

Transports

A transport is an abstraction on top of a socket or something similar (for example, a UNIX pipe or an SSL connection). Transports are strongly influenced by Twisted and PEP 3153. Users rarely implement or instantiate transports -- rather, event loops offer utility methods to set up transports.

Transports work in conjunction with protocols. Protocols are typically written without knowing or caring about the exact type of transport used, and transports can be used with a wide variety of protocols. For example, an HTTP client protocol implementation may be used with either a plain socket transport or an SSL transport. The plain socket transport can be used with many different protocols besides HTTP (e.g. SMTP, IMAP, POP, FTP, IRC, SPDY).

Most connections have an asymmetric nature: the client and server usually have very different roles and behaviors. Hence, the interface between transport and protocol is also asymmetric. From the protocol's point of view, writing data is done by calling the write() method on the transport object; this buffers the data and returns immediately. However, the transport takes a more active role in reading data: whenever some data is read from the socket (or other data source), the transport calls the protocol's data_received() method.

Transports have the following public methods:

write(data). Write some bytes. The argument must be a bytes object. Returns None. The transport is free to buffer the bytes, but it must eventually cause the bytes to be transferred to the entity at the other end, and it must maintain stream behavior. That is, t.write(b'abc'); t.write(b'def') is equivalent to t.write(b'abcdef'), as well as to::

t.write(b'a') t.write(b'b') t.write(b'c') t.write(b'd') t.write(b'e') t.write(b'f')
writelines(iterable). Equivalent to::

for data in iterable: self.write(data)
write_eof(). Close the writing end of the connection. Subsequent calls to write() are not allowed. Once all buffered data is transferred, the transport signals to the other end that no more data will be received. Some protocols don't support this operation; in that case, calling write_eof() will raise an exception. (Note: This used to be called half_close(), but unless you already know what it is for, that name doesn't indicate which end is closed.)
can_write_eof(). Return True if the protocol supports write_eof(), False if it does not. (This method is needed because some protocols need to change their behavior when write_eof() is unavailable. For example, in HTTP, to send data whose size is not known ahead of time, the end of the data is typically indicated using write_eof(); however, SSL does not support this, and an HTTP protocol implementation would have to use the "chunked" transfer encoding in this case. But if the data size is known ahead of time, the best approach in both cases is to use the Content-Length header.)
pause(). Suspend delivery of data to the protocol until a subsequent resume() call. Between pause() and resume(), the protocol's data_received() method will not be called. This has no effect on write().
resume(). Restart delivery of data to the protocol via data_received().
close(). Sever the connection with the entity at the other end. Any data buffered by write() will (eventually) be transferred before the connection is actually closed. The protocol's data_received() method will not be called again. Once all buffered data has been flushed, the protocol's connection_lost() method will be called with None as the argument. Note that this method does not wait for all that to happen.
abort(). Immediately sever the connection. Any data still buffered by the transport is thrown away. Soon, the protocol's connection_lost() method will be called with None as argument. (TBD: Distinguish in the connection_lost() argument between close(), abort() or a close initated by the other end? Or add a transport method to inquire about this? Glyph's proposal was to pass different exceptions for this purpose.)

TBD: Provide flow control the other way -- the transport may need to suspend the protocol if the amount of data buffered becomes a burden. Proposal: let the transport call protocol.pause() and protocol.resume() if they exist; if they don't exist, the protocol doesn't support flow control. (Perhaps different names to avoid confusion between protocols and transports?)

Protocols

Protocols are always used in conjunction with transports. While a few common protocols are provided (e.g. decent though not necessarily excellent HTTP client and server implementations), most protocols will be implemented by user code or third-party libraries.

A protocol must implement the following methods, which will be called by the transport. Consider these callbacks that are always called by the event loop in the right context. (See the "Context" section above.)

connection_made(transport). Indicates that the transport is ready and connected to the entity at the other end. The protocol should probably save the transport reference as an instance variable (so it can call its write() and other methods later), and may write an initial greeting or request at this point.
data_received(data). The transport has read some bytes from the connection. The argument is always a non-empty bytes object. There are no guarantees about the minimum or maximum size of the data passed along this way. p.data_received(b'abcdef') should be treated exactly equivalent to::

p.data_received(b'abc') p.data_received(b'def')
eof_received(). This is called when the other end called write_eof() (or something equivalent). The default implementation calls close() on the transport, which causes connection_lost() to be called (eventually) on the protocol.
connection_lost(exc). The transport has been closed or aborted, has detected that the other end has closed the connection cleanly, or has encountered an unexpected error. In the first three cases the argument is None; for an unexpected error, the argument is the exception that caused the transport to give up. (TBD: Do we need to distinguish between the first three cases?)

Here is a chart indicating the order and multiplicity of calls:

connection_made() -- exactly once
data_received() -- zero or more times
eof_received() -- at most once
connection_lost() -- exactly once

TBD: Discuss whether user code needs to do anything to make sure that protocol and transport aren't garbage-collected prematurely.

Callback Style

Most interfaces taking a callback also take positional arguments. For instance, to arrange for foo("abc", 42) to be called soon, you call ev.call_soon(foo, "abc", 42). To schedule the call foo(), use ev.call_soon(foo). This convention greatly reduces the number of small lambdas required in typical callback programming.

This convention specifically does not support keyword arguments. Keyword arguments are used to pass optional extra information about the callback. This allows graceful evolution of the API without having to worry about whether a keyword might be significant to a callee somewhere. If you have a callback that must be called with a keyword argument, you can use a lambda or functools.partial. For example::

ev.call_soon(functools.partial(foo, "abc", repeat=42))

Choosing an Event Loop Implementation

TBD. (This is about the choice to use e.g. select vs. poll vs. epoll, and how to override the choice. Probably belongs in the event loop policy.)

Coroutines and the Scheduler

This is a separate toplevel section because its status is different from the event loop interface. Usage of coroutines is optional, and it is perfectly fine to write code using callbacks only. On the other hand, there is only one implementation of the scheduler/coroutine API, and if you're using coroutines, that's the one you're using.

Coroutines

A coroutine is a generator that follows certain conventions. For documentation purposes, all coroutines should be decorated with @tulip.coroutine, but this cannot be strictly enforced.

Coroutines use the yield from syntax introduced in PEP 380, instead of the original yield syntax.

The word "coroutine", like the word "generator", is used for two different (though related) concepts:

The function that defines a coroutine (a function definition decorated with tulip.coroutine). If disambiguation is needed, we call this a coroutine function.
The object obtained by calling a coroutine function. This object represents a computation or an I/O operation (usually a combination) that will complete eventually. For disambiguation we call it a coroutine object.

Things a coroutine can do:

result = yield from future -- suspends the coroutine until the future is done, then returns the future's result, or raises its exception, which will be propagated.
result = yield from coroutine -- wait for another coroutine to produce a result (or raise an exception, which will be propagated). The coroutine expression must be a call to another coroutine.
results = yield from tulip.par(futures_and_coroutines) -- Wait for a list of futures and/or coroutines to complete and return a list of their results. If one of the futures or coroutines raises an exception, that exception is propagated, after attempting to cancel all other futures and coroutines in the list.
return result -- produce a result to the coroutine that is waiting for this one using yield from.
raise exception -- raise an exception in the coroutine that is waiting for this one using yield from.

Calling a coroutine does not start its code running -- it is just a generator, and the coroutine object returned by the call is really a generator object, which doesn't do anything until you iterate over it. In the case of a coroutine object, there are two basic ways to start it running: call yield from coroutine from another coroutine (assuming the other coroutine is already running!), or convert it to a Task.

Coroutines can only run when the event loop is running.

Tasks

A Task is an object that manages an independently running coroutine. The Task interface is the same as the Future interface. The task becomes done when its coroutine returns or raises an exception; if it returns a result, that becomes the task's result, if it raises an exception, that becomes the task's exception.

Cancelling a task that's not done yet prevents its coroutine from completing; in this case an exception is thrown into the coroutine that it may catch to further handle cancellation, but it doesn't have to (this is done using the standard close() method on generators, described in PEP 342).

The par() function described above runs coroutines in parallel by converting them to Tasks. (Arguments that are already Tasks or Futures are not converted.)

Tasks are also useful for interoperating between coroutines and callback-based frameworks like Twisted. After converting a coroutine into a Task, callbacks can be added to the Task.

You may ask, why not convert all coroutines to Tasks? The @tulip.coroutine decorator could do this. This would slow things down considerably in the case where one coroutine calls another (and so on), as switching to a "bare" coroutine has much less overhead than switching to a Task.

The Scheduler

The scheduler has no public interface. You interact with it by using yield from future and yield from task. In fact, there is no single object representing the scheduler -- its behavior is implemented by the Task and Future classes using only the public interface of the event loop, so it will work with third-party event loop implementations, too.

Sleeping

TBD: yield sleep(seconds). Can use sleep(0) to suspend to poll for I/O.

Wait for First

TBD: Need an interface to wait for the first of a collection of Futures.

Coroutines and Protocols

The best way to use coroutines to implement protocols is probably to use a streaming buffer that gets filled by data_received() and can be read asynchronously using methods like read(n) and readline() that return a Future. When the connection is closed, read() should return a Future whose result is b'', or raise an exception if connection_closed() is called with an exception.

To write, the write() method (and friends) on the transport can be used -- these do not return Futures. A standard protocol implementation should be provided that sets this up and kicks off the coroutine when connection_made() is called.

TBD: Be more specific.

Cancellation

TBD. When a Task is cancelled its coroutine may see an exception at any point where it is yielding to the scheduler (i.e., potentially at any yield from operation). We need to spell out which exception is raised.

Also TBD: timeouts.

Open Issues

A debugging API? E.g. something that logs a lot of stuff, or logs unusual conditions (like queues filling up faster than they drain) or even callbacks taking too much time...
Do we need introspection APIs? E.g. asking for the read callback given a file descriptor. Or when the next scheduled call is. Or the list of file descriptors registered with callbacks.
Should we have future.add_callback(callback, *args), using the convention from the section "Callback Style" above, or should we stick with the PEP 3148 specification of future.add_done_callback(callback) which calls callback(future)? (Glyph suggested using a different method name since add_done_callback() does not guarantee that the callback will be called in the right context.)
Returning a Future is relatively expensive, and it is quite possible that some types of calls usually complete immediately (e.g. writing small amounts of data to a socket). A trick used by Richard Oudkerk in the tulip project's proactor branch makes calls like recv() either return a regular result or raise a Future. The caller (likely a transport) must then write code like this::

try: res = ev.sock_recv(sock, 8192) except Future as f: yield from sch.block_future(f) res = f.result()
Do we need a larger vocabulary of operations for combining coroutines and/or futures? E.g. in addition to par() we could have a way to run several coroutines sequentially (returning all results or passing the result of one to the next and returning the final result?). We might also introduce explicit locks (though these will be a bit of a pain to use, as we can't use the with lock: block syntax). Anyway, I think all of these are easy enough to write using Task.

Proposal: f = yield from wait_one(fs) takes a set of Futures and sets f to the first of those that is done. (Yes, this requires an intermediate Future to wait for.) You can then write::

while fs: f = tulip.wait_one(fs) fs.remove(f)
Support for datagram protocols, "connected" or otherwise? Probably need more socket I/O methods, e.g. sock_sendto() and sock_recvfrom(). Or users can write their own (it's not rocket science). Is it reasonable to map write(), writelines(), data_received() to single datagrams?
Task or callback priorities? (I hope not.)
An EventEmitter in the style of NodeJS? Or make this a separate PEP? It's easy enough to do in user space, though it may benefit from standardization. (See https://github.com/mnot/thor/blob/master/thor/events.py and https://github.com/mnot/thor/blob/master/doc/events.md for examples.)

Acknowledgments

Apart from PEP 3153, influences include PEP 380 and Greg Ewing's tutorial for yield from, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip (the author's attempts at synthesis of all these), wattle (Steve Dower's counter-proposal), numerous discussions on python-ideas from September through December 2012, a Skype session with Steve Dower and Dino Viehland, email exchanges with Ben Darnell, an audience with Niels Provos (original author of libevent), and two in-person meetings with several Twisted developers, including Glyph, Brian Warner, David Reid, and Duncan McGreggor. Also, the author's previous work on async support in the NDB library for Google App Engine was an important influence.

Copyright

This document has been placed in the public domain.

.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

Previous message: [Python-Dev] What is the "sequence"? ([issue16728] collections.abc.Sequence shoud provide __subclasshook__
Next message: [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list