[Python-Dev] defaultdict proposal round three (original) (raw)

Ian Bicking ianb at colorstudy.com
Mon Feb 20 22:13:23 CET 2006


Alex Martelli wrote:

I prefer this approach over subclassing. The mental load from an additional method is less than the load from a separate type (even a subclass). Also, avoidance of invariant issues is a big plus. Besides, if this allows setdefault() to be deprecated, it becomes an all-around win.

I'd love to remove setdefault in 3.0 -- but I don't think it can be done before that: defaultfactory won't cover the occasional use cases where setdefault is called with different defaults at different locations, and, rare as those cases may be, any 2.* should not break any existing code that uses that approach.

Would it be deprecated in 2.*, or start deprecating in 3.0?

Also, is default_factory=list threadsafe in the same way .setdefault is? That is, you can safely do this from multiple threads:

d.setdefault(key, []).append(value)

I believe this is safe with very few caveats -- setdefault itself is atomic (or else I'm writing some bad code ;). My impression is that default_factory will not generally be threadsafe in the way setdefault is. For instance:

def make_list(): return [] d = dict d.default_factory = make_list

from multiple threads:

d.getdef(key).append(value)

This would not be correct (a value can be lost if two threads concurrently enter make_list for the same key). In the case of default_factory=list (using the list builtin) is the story different? Will this work on Jython, IronPython, or PyPy? Will this be a documented guarantee? Or alternately, are we just creating a new way to punish people who use threads? And if we push threadsafety up to user code, are we trading a very small speed issue (creating lists that are thrown away) for a much larger speed issue (acquiring a lock)?

I tried to make a test for this threadsafety, actually -- using a technique besides setdefault which I knew was bad (try:except KeyError:). And (except using time.sleep(), which is cheating), I wasn't actually able to trigger the bug. Which is frustrating, because I know the bug is there. So apparently threadsafety is hard to test in this case. (If anyone is interested in trying it, I can email what I have.)

Note that multidict -- among other possible concrete collection patterns (like Bag, OrderedDict, or others) -- can be readily implemented with threading guarantees.

-- Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org



More information about the Python-Dev mailing list