[Python-Dev] generic async io (was: microthreading vs. async io) (original) (raw)

Nick Maclaren nmm1 at cus.cam.ac.uk
Thu Feb 15 20:46:59 CET 2007

Previous message: [Python-Dev] microthreading vs. async io
Next message: [Python-Dev] generic async io (was: microthreading vs. async io)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I think this discussion would be facilitated by teasing the first bullet-point from the latter two: the first deals with async IO, while the latter two deal with cooperative multitasking. It's easy to write a single package that does both, but it's much harder to write two fairly generic packages with a clean API between them, given the varied platform support for async IO and the varied syntax and structures (continuations vs. microthreads, in my terminology) for multitasking. Yet I think that division is exactly what's needed.

Hmm. Now, please, people, don't take offence, but I don't know how to phrase this tactfully :-(

The 'threading' approach to asynchronous I/O was found to be a BAD IDEA back in the 1970s, was abandoned in favour of separating asynchronous I/O from threading, and God alone knows why it was reinvented - except that most of the people with prior experience had died or retired :-(

Let's go back to the days when asynchronous I/O was the norm, and I/O performance critical applications drove the devices directly. In those days, yes, that approach did make sense. But it rapidly ceased to do so with the advent of 'semi-intelligent' devices and the virtualisation of I/O by the operating system. That was in the mid-1970s. Nowadays, ALL devices are semi-intelligent and no system since Unix has allowed applications direct access to devices, except for specialised HPC and graphics.

We used to get 90% of theoretical peak performance on mainframes using asynchronous I/O from clean, portable applications, but it was NOT done by treating the I/O as threads and controlling their synchronisation by hand. In fact, quite the converse! It was done by realising that asynchronous I/O and explicit threading are best separated ENTIRELY. There were two main models:

Streaming, as in most languages (Fortran, C, Python, but NOT in POSIX). The key properties here are that the transfer boundaries have no significance, only heavyweight synchronisation primitives (open, close etc.) provide any constraints on when data are actually transferred and (for very high performance) buffers are unavailable from when a transfer is started to when it is checked. If copying is acceptable, the last constraint can be dropped.

In the simple case, this allows the library/system to reblock and perform transfers asynchronously. In the more advanced case, the application has to use multiple buffering (at least double), but can get full performance without any form of threading. IBM MVT applications used to get up to 90% without hassle in parallel with computation and using only a single thread (well, there was only a single CPU, anyway).

The other model is transactions. This has the property that there is a global commit primitive, and the order of transfers is undefined between commits. Inter alia, it means that overlapping transfers are undefined behaviour, whether in a single thread or in multiple threads. BSP uses this model.

The MPI-2 design team included a lot of ex-mainframe people and specifies both models. While it is designed for parallel applications, the I/O per se is not controlled like threads.

Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

Previous message: [Python-Dev] microthreading vs. async io
Next message: [Python-Dev] generic async io (was: microthreading vs. async io)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list