[Python-Dev] deja-vu .. python locking (original) (raw)
Martin Devera devik at cdi.cz
Mon Sep 18 15:46:02 CEST 2006
- Previous message: [Python-Dev] BRANCH FREEZE/IMMINENT RELEASE: Python 2.5 (final). 2006-09-19, 00:00UTC
- Next message: [Python-Dev] deja-vu .. python locking
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
as someone has written in FAQ, sometimes someone starts a thread about finer grained locking in Python. Ok here is one.
I don't want to start a flamewar. I only seek suggestions and constructive critic. I have some ideas whose are new in this context (I believe) and I only wanted to make them public in case someone finds them interesting.
Comments are welcome. Martin
Round 1, Greg Stein's patch The patch removes GIL from version 1.6 and replaces locking of list, dict and other structures with finer grained locking. The major slowdown seems to be in list and dict structures, dicts are used for object attributes and these are accessed quite often.
Because (IIUC) mutex struct is quite heavy, dict and list are locked via pool of locks. When you lock this pooled lock you have to lock two locks in reality. One locks pool itself, and other locks the pooled lock (the second locking can be omited in non contended case because locks in the pool are in locked state). One lock take about 25 cycles on UP P4 (mainly pipeline flush during memory barrier) and can be even more expensive (hundreds of cycles) due to cacheline move between CPUs on SMP machine. "Global" pool lock is subject to cacheline pinpong as it will be often reacquired by competing CPUs. In mappinglookup there is lookmapping guarded by this locking scheme, lookmapping itself has about 20 cycles in the best (one hope typical) case plus compareobj cost (in case of string keys say ... 50..100 cycles?). Thus locking/unlocking the read takes 50..100 cycles and operation itself is 70-120 cycles. One might expect about 50% slowdown in dict read path.
RCU like locking Solution I have in mind is similar to RCU. In Python we have quiscent state - when a thread returns to main loop of interpreter. Let's add "owner_thread" field to locked object. It reflects last thread (its id) which called any lockable method on the object. Each LOCK operation looks like: while (ob->owner_thread != self_thread()) { unlock_mutex(thread_mutex[self_thread()]) // wait for owning thread to go to quiscent state lock_mutex(thread_mutex[ob->owner_thread]) ob->owner_thread = self_thread() unlock_mutex(thread_mutex[ob->owner_thread]) lock_mutex(thread_mutex[self_thread()]) } Unlock is not done - we own the object now and can use it without locking (until we return to interpreter loop or we call LOCK on other object). For non-shared objects there is only penalty of ob->owner_thread != self_thread() condition. Not sure about Windows, but in recent Linuxes one can use %gs register as thread id, thus compare is about 3 cycles (and owner_thread should be in object's cacheline anyway). In contended case there is some cache pingpong with ob and mutex but it is as expected.
Deadlocks Our object ownership is long - from getting it in LOCK to next quiscent state of the thread. Thus when two threads want to step each on other's object, they will deadlock. Simple solution is to extend set of quiscent states. It is when thread releases its thread_mutex in main loop (and immediately reacquires). Additionaly it can release it just before it is going to wait on another thread's mutex, like in LOCK (already in code above). If you use LOCK correctly then when you are LOCKing an object you can't be in vulnerable part of OTHER object. So that let other threads to get ownership of your own objects in that time. One can also want to release his lock when going to lock mutex in threading package and in other places where GIL is released today. However I admit that I did no formal proof regarding deadlock, I plan to do it if nobody can find other flaw in the proposal.
Big reader lock While above scheme might work well, it'd impose performance penalty for shared dicts which are almost read only (module.dict). For these similar locking can be used, only writer has to wait until ALL other threads enter quiscent state (take locks of them), then perform change and unlock them all. Readers can read without any locking.
Compatibilty with 3rd party modules I've read this argument on pydev list. Maybe I'm not understanding something, but is it so complex for Py_InitModule4 to use extra flag in apiver for example ? When at least one non-freethreaded module is loaded, locking is done in old good way...
- Previous message: [Python-Dev] BRANCH FREEZE/IMMINENT RELEASE: Python 2.5 (final). 2006-09-19, 00:00UTC
- Next message: [Python-Dev] deja-vu .. python locking
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]