[Python-Dev] Repeatability of looping over dicts (original) (raw)

Gregory P. Smith greg at krypto.org
Sat Jan 5 07:17:35 CET 2008


On 1/4/08, A.M. Kuchling <amk at amk.ca> wrote:

This post describes work aimed at getting Django to run on Jython: http://zyasoft.com/pythoneering/2008/01/django-on-jython-minding-gap.html

One outstanding issue is whether to use Java's ConcurrentHashMap type to underly Jython's dict type. See <http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html>.

As a side note, you may also want to consider using a NonBlockingHashMap as implemented by Cliff Click instead of ConcurrentHashMap:

http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html http://blogs.azulsystems.com/cliff/2007/03/talking_with_go.html http://video.google.com/videoplay?docid=2139967204534450862 http://sourceforge.net/projects/high-scale-lib

I haven't looked to see if it has the same issues repeatability issues but it does scale even better.

Anyways, could you not use something with repeatability issues with a wrapper that caches lists of keys to satisfy the repeatability? (don't lock and maintain the list, just regenerate the list on demand when a caller needs it and discard the cached list anytime a key is added or deleted)

Also, should we drop this guarantee in py3k to give future implementations more flexibility?

-gps

ConcurrentHashMap scales better in the face of threading because it doesn't lock the whole table when updating it, but iterating over the map can return elements in a different order each time. This would mean that list(dictvar) doesn't return values in the same order as a later call to list(dictvar), even if dictvar hasn't been modified.

Why? Under the hood, there are 32 different locks, each guarding a subset of the hash buckets, so if there are multiple threads iterating over the dictionary, they may not go through the buckets in order. <http://www.ibm.com/developerworks/java/library/j-jtp08223/> discusses the implementation, at least in 2003. So, do Python implementations need to guarantee that list(dictvar) == a later result from list(dictvar)? --amk



More information about the Python-Dev mailing list