[Python-Dev] Proposal: defaultdict (original) (raw)

Guido van Rossum guido at python.org
Fri Feb 17 20:09:47 CET 2006


On 2/16/06, Guido van Rossum <guido at python.org> wrote:

Over lunch with Alex Martelli, he proposed that a subclass of dict with this behavior (but implemented in C) would be a good addition to the language. It looks like it wouldn't be hard to implement. It could be a builtin named defaultdict. The first, required, argument to the constructor should be the default value. Remaining arguments (even keyword args) are passed unchanged to the dict constructor.

Thanks for all the constructive feedback. Here are some responses and a new proposal.

So here's a new proposal.

Let's add a generic missing-key handling method to the dict class, as well as a default_factory slot initialized to None. The implementation is like this (but in C):

def on_missing(self, key): if self.default_factory is not None: value = self.default_factory() self[key] = value return value raise KeyError(key)

When getitem() (and only getitem()) finds that the requested key is not present in the dict, it calls self.on_missing(key) and returns whatever it returns -- or raises whatever it raises. getitem() doesn't need to raise KeyError any more, that's done by on_missing().

The on_missing() method can be overridden to implement any semantics you want when the key isn't found: return a value without inserting it, insert a value without copying it, only do it for certain key types/values, make the default incorporate the key, etc.

But the default implementation is designed so that we can write

d = {} d.default_factory = list

to create a dict that inserts a new list whenever a key is not found in getitem(), which is most useful in the original use case: implementing a multiset so that one can write

d[key].append(value)

to add a new key/value to the multiset without having to handle the case separately where the key isn't in the dict yet. This also works for sets instead of lists:

d = {} d.default_factory = set ... d[key].add(value)

I went through several iterations to obtain this design; my first version of on_missing() would just raise KeyError(key), requiring you to always provide a subclass; this is more minimalistic but less useful and would probably raise the bar for using the feature to some extent.

To saev you attempts to simplify this, here are some near-misses I considered that didn't quite work out:

This would require the multiset example to subclass, since default_factory doesn't see the key so it can't insert it.

This appears to fix that problem, but now you can't write "d.default_value = list" since (a) list(key) doesn't return an empty list, and (b) it also doesn't insert the key into the dict; attempting to assign a callback function to default_factory that solves these issues fail because the callback doesn't have access to the dict instance (unless there's only one).

This is less general in case you want different default semantics (e.g. not inserting the default, or making the default a function of the key) -- you'd have to override getitem() for that, which means you'd be paying overhead even for keys that are present.

I'll try to cook up an implementation on SF after I've dug myself out of the day's email barrage.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list