[Python-Dev] deja-vu .. python locking (original) (raw)

Martin Devera devik at cdi.cz
Tue Sep 19 09:51:18 CEST 2006


Ah, I think I understand now. First the minor critique: I believe the locking algorithm isn't thread-safe:

while (ob->ownerthread != selfthread()) { unlockmutex(threadmutex[selfthread()]) // wait for owning thread to go to quiscent state lockmutex(threadmutex[ob->ownerthread]) ob->ownerthread = selfthread() unlockmutex(threadmutex[ob->ownerthread]) lockmutex(threadmutex[selfthread()]) } If two threads are competing for the same object held by a third thread, they may simultaneously enter the while loop, and then simultaneously try to lock the ownerthread. Now, one will win, and own the object. Later, the other will gain the lock, and unconditionally overwrite ownership. This will cause two threads to own the objects, which is an error.

oops .. well it seems as very stupid error on my side. Yes you are absolutely right, I'll have to rethink it. I hope it is possible to do it in correct way...

The more fundamental critique is: Why? It seems you do this to improve efficiency, (implicitly) claiming that it is more efficient to keep holding the lock, instead of releasing and re-acquiring it each time.

I claim that this doesn't really matter: any reasonable mutex implementation will be "fast" if there is no lock contention. On locking, it will not invoke any system call if the lock is currently not held (but just atomically test-and-set some field of the lock); on unlocking, it will not invoke any system call if the wait list is empty. As you also need to test, there shouldn't be much of a performance difference.

I measured it. Lock op in futex based linux locking is of the same speed as windows critical section and it is about 30 cycles on my P4 1.8GHz in uncontented case. As explained in already mentioned http://www.linuxjournal.com/article/6993 it seems due to pipeline flush during cmpxchg insn. And there will be cacheline transfer penalty which is much larger. So that mutex locking will take time comparable with protected code itself (assuming fast code like dict/list read). Single compare will take ten times less. Am I missing something ?

thanks, Martin



More information about the Python-Dev mailing list