[Python-Dev] generic async io (original) (raw)

Joachim Koenig-Baltes joachim.koenig-baltes at emesgarten.de
Thu Feb 15 21:15:32 CET 2007


dustin at v.igoro.us wrote:

I think this discussion would be facilitated by teasing the first bullet-point from the latter two: the first deals with async IO, while the latter two deal with cooperative multitasking.

It's easy to write a single package that does both, but it's much harder to write two fairly generic packages with a clean API between them, given the varied platform support for async IO and the varied syntax and structures (continuations vs. microthreads, in my terminology) for multitasking. Yet I think that division is exactly what's needed. Since you asked (I'll assume the check for $0.02 is in the mail), I think a strictly-async-IO library would offer the following: - a sleep queue object to which callables can be added - wrappers for all/most of the stdlib blocking IO operations which add the operation to the list of outstanding operations and return a sleep queue object - some relatively easy method of extending that for new IO operations - a poll() function (for multitasking libraries) and a serveforever() loop (for asyncore-like uses, where all the action is IO-driven)

A centralized approach of wrapping all blocking IO operation in stdlib could only work in pure python applications. What about extensions that integrate e.g. gtk2, gstreamer and other useful libraries that come with their own low level IO. Python is not the right place to solve this problem, and there are so many C-Libraries which tried it, e.g. gnu-pth tries to implement pthreads on a single-threaded OS.

But none of these approaches is perfect. E.g. if you want to read 5 bytes from a fd, you can use FIONREAD on a socket and get the number of bytes available from the OS, so you can be sure to not block, but FIONREAD on a normal file fd (e.g. on a NFS mount) will not tell you, how many bytes the OS has prefetched, so you might block, even if you are reading only 1 byte.

I think it's best to decide how to do the low level IO for each case in the task. It knows what's it's doing and how to avoid blocking.

Therefore I propose to decouple the waiting for a condition/event from the actual blocking operation. And to avoid the blocking, there is no need to reinvent the wheel, the socket module already provides ways to avoid it for network IO and a lot of C libraries exist to do it in a portable way, but none is perfect.

And based on these events it's much easier to design a schedular than to write one which also has to do the non blocking IO operations in order to give the tasks the illusion of a blocking operation.

The BSD kevent is the most powerful event waiting mechanism with kernel support (as it unifies the waiting on different events on different resources like fd, process, timers, signals) but its API can be emulated to a sudden degree in the other event mechanism like notify on Linux or Niels Provos' libevent.

The real showstopper for making the local event waiting easy are the missing coroutines or at least a form of non local goto like setjmp/longjump in C (that's what greenlets provides), remember that yield() only suspends the current function, so every function on the stack must be prepared to handle the yield, even if they are not interested in it (hiding this fact with decorators does not make it better IMO)

Joachim



More information about the Python-Dev mailing list