[Python-Dev] pyparallel and new memory API discussions... (original) (raw)
Trent Nelson trent at snakebite.org
Wed Jun 19 16:33:10 CEST 2013
- Previous message: [Python-Dev] pyparallel and new memory API discussions...
- Next message: [Python-Dev] pyparallel and new memory API discussions...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Charles-François!
Good to hear from you again. It was actually your e-mail a few
months ago that acted as the initial catalyst for this memory
protection idea, so, thanks for that :-)
Answer below.
On Wed, Jun 19, 2013 at 07:01:49AM -0700, Charles-François Natali wrote:
2013/6/19 Trent Nelson <trent at snakebite.org>: > > The new memory API discussions (and PEP) warrant a quick pyparallel > update: a couple of weeks after PyCon, I came up with a solution for > the biggest show-stopper that has been plaguing pyparallel since its > inception: being able to detect the modification of "main thread" > Python objects from within a parallel context. > > For example,
data.append(4)
in the example below will generate an > AssignmentError exception, because data is a main thread object, and >data.append(4)
gets executed from within a parallel context:: > > data = [ 1, 2, 3 ] > > def work(): > data.append(4) > > async.submitwork(work) > > The solution turned out to be deceptively simple: > > 1. Prior to running parallel threads, lock all "main thread" > memory pages as read-only (via VirtualProtect on Windows, > mprotect on POSIX). > > 2. Detect attempts to write to main thread pages during parallel > thread execution (via SEH on Windows or a SIGSEGV trap on POSIX), > and raise an exception instead (detection is done in the ceval > frame exec loop).Quick stupid question: because of refcounts, the pages will be written to even in case of read-only access. How do you deal with this?
Easy: I don't refcount in parallel contexts :-)
There's no need, for two reasons:
1. All memory allocated in a parallel context is localized to a
private heap. When the parallel context is finished, the entire
heap can be blown away in one fell-swoop. There's no need for
reference counting or GC because none of the objects will exist
after the parallel context completes.
2. The main thread won't be running when parallel threads/contexts
are executing, which means main thread objects being accessed in
parallel contexts (read-only access is fine) won't be suddenly
free()'d or GC-collected or whatever.
You get credit for that second point; you asked a similar question a
few months ago that made me realize I absolutely couldn't have the
main thread running at the same time the parallel threads were
running.
Once I accepted that as a design constraint, everything else came
together nicely... "Hmmm, if the main thread isn't running, it won't
need write-access to any of its pages! If we mark them read-only,
we could catch the traps/SEHs from parallel threads, then raise an
exception, ahh, simple!".
I'm both chuffed at how simple it is (considering it was *the* major
show-stopper), and miffed at how it managed to elude me for so long
;-)
Regards,
Trent.
- Previous message: [Python-Dev] pyparallel and new memory API discussions...
- Next message: [Python-Dev] pyparallel and new memory API discussions...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]