[Python-Dev] Pythonic concurrency - cooperative MT (original) (raw)
Michael Sparks ms at cerenity.org
Sun Oct 2 01:13:15 CEST 2005
- Previous message: [Python-Dev] Pythonic concurrency - cooperative MT
- Next message: [Python-Dev] Pythonic concurrency - cooperative MT
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Saturday 01 October 2005 22:50, Martin Blais wrote: ...
because some people on the list appear to discuss generators as a concurrency scheme, and as far as I know they really are not addressing that at all.
Our project started in the context of dealing with the task of a naturally concurrent environment. Specifically the task is that of dealing with large numbers of concurrent connections to a server. As a result, when I've mentioned concurrency it's been due to coming from that viewpoint.
In the past I have worked with systems essentially structured in a similar way to Twisted for this kind of problem, but decided against that style for our current project. (Note some people misunderstand my opinions here due to a badly phrased lightning talk ~16 months ago at Europython 2004 - I think twisted is very much best of breed in python for what it does. I just think there //might// be a nicer way. (might :) )
Since I now work in an R&D dept I wondered what would happen if instead of the basic approach that underlies systems like twisted what would happen if you took a much more CSP-like approach to building such systems, but using generators rather than threads or explicit state machines.
A specific goal was to try and make code simpler for people to work with - with the aim actually of simplifying maintenance as the main by-product. I hadn't heard of anyone trying this approach then and hypothesised it might achieve that goal.
As a result from day 1 it became clear that where an event based system would normally use a reactor/proactor based approach, that you replace that with a scheduler that repeatedly calls a next method on objects given to it to schedule.
In terms of concurrency that is clearly a co-operative multitasking system in the same way as a simplistic event based system is. (Both get more complex in reality when you can't avoid blocking forcing the use of threads for some tasks)
So when you say this:
explicitly frame the discussion in the context of a single process cooperative scheme that runs on a single processor (at any one time).
This is spot on. However, any generator can be farmed off and run in a thread. Any communications you did with the generator can be wrapped via Queues then - forming a controlled bridge between the threads. Similarly we're currently looking at using non-blocking pipes and pickling to communicate with generators running in a forked environment.
As a result if you write your code as generators it can migrate to a threaded or process based environment, and scale across multiple processes (and hence processors) if tools to perform this migration are put in place. We're a little way off doing that, but this looks to be highly reasonable.
As far as I understand, generators are just a convenient way to
They give you code objects that can do a return and continue later. This isn't really the same as the ability to just do a goto into random points in a function. You can only go back to the point the generator yielded at (unless someone has a perverse trick :-).
You could easily implement something very similar to generators by encapsulating the local scope explicitly in the form of a class, with instance attributes, and having an normal method "step()" that would be careful about saving state in the object's attributes everytime it returns and restoring state from those attributes everytime it gets called.
For a more explicit version of this we have a (deliberately naive) C++ version of generators & our core concurrency system. Mechanism is here: http://tinyurl.com/7gaol , example use here: http://tinyurl.com/bgwro That does precisely that. (except we use a next() method there :-)
Therefore, as far as I understand it, generators themselves DO NOT implement any form of concurrency.
By themselves, they don't. They can be used to deal with concurrency though.
2. In a more advanced framework/language, perhaps some generators could be considered to always be possible to run asynchronously, ruled by a system of true concurrency with some kind of scheduling algorithm that oversees that process. Whether this has been implemented by many is still a mystery to me,
This is what we do. Our tutorial we've given to trainees (one of whom have had very little experience of even programming) was able to pick up our approach quickly due to our tutorial. This requires them to implement a mini-version of the framework, which might actually aid the discussion here since it very clearly shows the core of our system. (nb it is however a simplified version) I previously posted a link to it, which is here: http://kamaelia.sourceforge.net/MiniAxon/
but I can see how a low-level library that provides asynchronously running execution vehicles for each CPU could be used to manage and run a pool of shared generator objects in a way that is better (for a specific application) than the relatively uninformed scheduling provided by the threads abstraction (more at the end).
Pseudo or cooperative concurrency is not the same as true asynchronous concurrency.
Correct. I've had discussions with a colleague at work who wants to work on the underlying formal semantics of our system for verification purposes, and he pointed out that the core assumption with a pure generator approach effectively serialises the application, which may hide problems in the true parallel approach (eg only using processes for a CSP-like system).
However that statement had an underlying assumption: that the system would be a pure generator system. As soon as you involve multiple systems using network connections, and threads (since we have threaded components as well), and processes (which has always been on the cards, all our desktop machines are dual processor and it just seems a waste to use just one) then the system goes truly asynchronous.
As a result we (at least :-) have thought about these problems along the way.
you have to deal with the way that they might access some same piece of data at the same time.
We do. We have both an underlying approach to deal with this and a metaphor that encourages correct usage. The underlying mechanism is based on explicit hand off of data between asynchronous activities. Once you have handed off a piece of data, you no longer own it and can no longer change it. If you are handed a piece of data you can do anything you like with it, for as long as you like until you hand it off or throw it away.
The metaphor of old-fashioned paper based inboxes (or in-trays) and outboxes (or out-trays) conceptually reinforces this idea - naturally encouraging safer programming styles.
This means that we only ever have a single reader and single writer for any item of data, which eliminates whole swathes of concurrency issues - whether you're pseudo-concurrent (ie threads[], generators) or truly concurrent (processes on multiple processors). [] Still only 1 CPU really.
Effectively there is no global data. If there is any global data (since we do have a global address space we tend to think of as similar to a linda tuple space), then it has a single owner. Others may read it, but only one may write to it. Because this is python, this is enforced by convention. (But the use is discouraged and rarely needed).
The use of generators effectively also hides the local variables from accidental external modification. Which is a secondary layer of protection.
If you do that then you might not have to lock access to your data structures at all.
We don't have to lock data structures at all - this is because we have explicit hand off of data. If we hand off between processes, we do this via Queues that handle the locking issues for us.
To implement that explicitly, you would need an asynchronous version of all the functions that may block on resources (e.g. file open, socket write, etc.)
Or you can create a generator that handles reading from a file and hands off the data on to the next component explicitly. The file reader is given CPU time by the scheduler. This can seem odd unless you've done any shell programming in which case the idea should be obvious:
echo ls *py |while read i; do wc -l $i |cut -d \ -f1; done
| sed -e 's/ /+/g' | bc
(yes I know there's better ways of doing this :)
So all in all, I'd say "yes" generators aren't really concurrent, but they are a very good way (IMHO) of dealing with concurrency in a single thread and map naturally if you're careful in designing your approach early on to map to a thread/process based approach cleanly.
If you think I'm talking a load of sphericals (for all I know it's possible I am, though I hope I'm not :-) , please look at our tutorial first, then at our howto for building components [] and tell me what we're doing wrong. I'd really like to know so we can make the system better, easier for newbies (and hence everyone else), and more trustable. [] http://tinyurl.com/dp8n7
(This really feels like this more of a comp.lang.python discussion really though, because AFAICT, python already has everything we need for this. I might revisit that thought when we've looked at shared memory issues though. IMHO though that would be largely stuff for the standard library.)
Best Regards,
Michael.
Michael Sparks, Senior R&D Engineer, Digital Media Group Michael.Sparks at rd.bbc.co.uk, http://kamaelia.sourceforge.net/ British Broadcasting Corporation, Research and Development Kingswood Warren, Surrey KT20 6NP
This e-mail may contain personal views which are not the views of the BBC.
- Previous message: [Python-Dev] Pythonic concurrency - cooperative MT
- Next message: [Python-Dev] Pythonic concurrency - cooperative MT
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]