[Python-Dev] Slides from today's parallel/async Python talk (original) (raw)

Trent Nelson trent at snakebite.org
Thu Apr 4 22:04:41 CEST 2013


Hi Charles-François,

On Thu, Apr 04, 2013 at 01🔞58AM -0700, Charles-François Natali wrote:

Just a quick implementation question (didn't have time to read through all your emails :-)

async.submitwork(func, args, kwds, callback=None, errback=None) How do you implement arguments passing and return value? e.g. let's say I pass a list as argument: how do you iterate on the list from the worker thread without modifying the backing objects for refcounts (IIUC you use a per-thread heap and don't do any refcounting).

Correct, nothing special is done for the arguments (apart from
incref'ing them in the main thread before kicking off the parallel
thread (then decref'ing them in the main thread once we're sure the
parallel thread has finished)).

Same thing for return value, how do you pass it to the callback?

For submit_work(), you can't :-)  In fact, an exception is raised if
the func() or callback() or errback() attempts to return a non-None
value.

It's worth noting that I eventually plan to have the map/reduce-type
functionality (similar to what multiprocessing offers) available via
a separate 'parallel' façade.  This will be geared towards programs
that are predominantly single-threaded, but have lots of data that
can be processed in parallel at various points.

Now, with that being said, there are a few options available at the
moment if you want to communicate stuff from parallel threads back
to the main thread.  Originally, you could do something like this:

    d = async.dict()
    def foo():
        d['foo'] = async.rdtsc()
    def bar():
        d['bar'] = async.rdtsc()

    async.submit_work(foo)
    async.submit_work(bar)

But I recently identified a few memory-management flaws with that
approach (I'm still on the fence with this issue... initially I was
going to drop all support, but I've since had ideas to address the
memory issues, so, we'll see).

There's also this option:

    d = dict()

    @async.call_from_main_thread_and_wait
    def store(k, v):
        d[str(k)] = str(v)

    def foo():
        store('foo', async.rdtsc())

    def bar():
        store('bar', async.rdtsc())

    async.submit_work(foo)
    async.submit_work(bar)

(Not a particularly performant option though; the main-thread
 instantly becomes the bottleneck.)

Post-PyCon, I've been working on providing new interlocked data
types that are specifically designed to bridge the parallel/main-
thread divide:

    xl = async.xlist()
    def foo():
        xl.push(async.rdtsc())
    def bar():
        xl.push(async.rdtsc())

    async.submit_work(foo)
    async.submit_work(bar)

    while True:
        x = xl.pop()
        if not x:
            break
        process(x)

What's interesting about xlist() is that it takes ownership of the
parallel objects being pushed onto it.  That is, it basically clones
them, using memory allocated from its own internal heap (allowing
the parallel-thread's context heap to be freed, which is desirable).

The push/pop operations are interlocked at the C level, which
obviates the need for any explicit locking.

I've put that work on hold for now though; I want to finish the
async client/server stuff (it's about 60-70% done) first.  Once
that's done, I'll tackle the parallel.*-type façade.

    Trent.


More information about the Python-Dev mailing list