[Python-Dev] pyparallel and new memory API discussions... (original) (raw)

Trent Nelson trent at snakebite.org
Wed Jun 19 16:33:10 CEST 2013


Hi Charles-François!

Good to hear from you again.  It was actually your e-mail a few
months ago that acted as the initial catalyst for this memory
protection idea, so, thanks for that :-)

Answer below.

On Wed, Jun 19, 2013 at 07:01:49AM -0700, Charles-François Natali wrote:

2013/6/19 Trent Nelson <trent at snakebite.org>: > > The new memory API discussions (and PEP) warrant a quick pyparallel > update: a couple of weeks after PyCon, I came up with a solution for > the biggest show-stopper that has been plaguing pyparallel since its > inception: being able to detect the modification of "main thread" > Python objects from within a parallel context. > > For example, data.append(4) in the example below will generate an > AssignmentError exception, because data is a main thread object, and > data.append(4) gets executed from within a parallel context:: > > data = [ 1, 2, 3 ] > > def work(): > data.append(4) > > async.submitwork(work) > > The solution turned out to be deceptively simple: > > 1. Prior to running parallel threads, lock all "main thread" > memory pages as read-only (via VirtualProtect on Windows, > mprotect on POSIX). > > 2. Detect attempts to write to main thread pages during parallel > thread execution (via SEH on Windows or a SIGSEGV trap on POSIX), > and raise an exception instead (detection is done in the ceval > frame exec loop).

Quick stupid question: because of refcounts, the pages will be written to even in case of read-only access. How do you deal with this?

Easy: I don't refcount in parallel contexts :-)

There's no need, for two reasons:

 1. All memory allocated in a parallel context is localized to a
    private heap.  When the parallel context is finished, the entire
    heap can be blown away in one fell-swoop.  There's no need for
    reference counting or GC because none of the objects will exist
    after the parallel context completes.

 2. The main thread won't be running when parallel threads/contexts
    are executing, which means main thread objects being accessed in
    parallel contexts (read-only access is fine) won't be suddenly
    free()'d or GC-collected or whatever.

You get credit for that second point; you asked a similar question a
few months ago that made me realize I absolutely couldn't have the
main thread running at the same time the parallel threads were
running.

Once I accepted that as a design constraint, everything else came
together nicely... "Hmmm, if the main thread isn't running, it won't
need write-access to any of its pages!  If we mark them read-only,
we could catch the traps/SEHs from parallel threads, then raise an
exception, ahh, simple!".

I'm both chuffed at how simple it is (considering it was *the* major
show-stopper), and miffed at how it managed to elude me for so long
;-)

Regards,

    Trent.


More information about the Python-Dev mailing list