[Python-Dev] Proposal: defaultdict (original) (raw)

Guido van Rossum guido at python.org
Fri Feb 17 20:09:47 CET 2006

Previous message: [Python-Dev] Proposal: defaultdict
Next message: [Python-Dev] Proposal: defaultdict
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/16/06, Guido van Rossum <guido at python.org> wrote:

Over lunch with Alex Martelli, he proposed that a subclass of dict with this behavior (but implemented in C) would be a good addition to the language. It looks like it wouldn't be hard to implement. It could be a builtin named defaultdict. The first, required, argument to the constructor should be the default value. Remaining arguments (even keyword args) are passed unchanged to the dict constructor.

Thanks for all the constructive feedback. Here are some responses and a new proposal.

Yes, I'd like to kill setdefault() in 3.0 if not sooner.
It would indeed be nice if this was an optional feature of the standard dict type.
I'm ignoring the request for other features (ordering, key transforms). If you want one of these, write a PEP!
Many, many people suggested to use a factory function instead of a default value. This is indeed a much better idea (although slightly more cumbersome for the simplest cases).
Some people seem to think that a subclass constructor signature must match the base class constructor signature. That's not so. The subclass constructor must just be careful to call the base class constructor with the correct arguments. Think of the subclass constructor as a factory function.
There's a fundamental difference between associating the default value with the dict object, and associating it with the call. So proposals to invent a better name/signature for setdefault() don't compete. (As to one specific such proposal, adding an optional bool as the 3rd argument to get(), I believe I've explained enough times in the past that flag-like arguments that always get a constant passed in at the call site are a bad idea and should usually be refactored into two separate methods.)
The inconsistency introduced by getitem() returning a value for keys while get(), contains(), and keys() etc. don't show it, cannot be resolved usefully. You'll just have to live with it. Modifying get() to do the same thing as getitem() doesn't seem useful -- it just takes away a potentially useful operation.

So here's a new proposal.

Let's add a generic missing-key handling method to the dict class, as well as a default_factory slot initialized to None. The implementation is like this (but in C):

def on_missing(self, key): if self.default_factory is not None: value = self.default_factory() self[key] = value return value raise KeyError(key)

When getitem() (and only getitem()) finds that the requested key is not present in the dict, it calls self.on_missing(key) and returns whatever it returns -- or raises whatever it raises. getitem() doesn't need to raise KeyError any more, that's done by on_missing().

The on_missing() method can be overridden to implement any semantics you want when the key isn't found: return a value without inserting it, insert a value without copying it, only do it for certain key types/values, make the default incorporate the key, etc.

But the default implementation is designed so that we can write

d = {} d.default_factory = list

to create a dict that inserts a new list whenever a key is not found in getitem(), which is most useful in the original use case: implementing a multiset so that one can write

d[key].append(value)

to add a new key/value to the multiset without having to handle the case separately where the key isn't in the dict yet. This also works for sets instead of lists:

d = {} d.default_factory = set ... d[key].add(value)

I went through several iterations to obtain this design; my first version of on_missing() would just raise KeyError(key), requiring you to always provide a subclass; this is more minimalistic but less useful and would probably raise the bar for using the feature to some extent.

To saev you attempts to simplify this, here are some near-misses I considered that didn't quite work out:

def on_missing(self, key): if self.default_factory is not None: return self.default_factory() raise KeyError(key)

This would require the multiset example to subclass, since default_factory doesn't see the key so it can't insert it.

def on_missing(self, key): if self.default_factory is not None: return self.default_factory(key) raise KeyError(key)

This appears to fix that problem, but now you can't write "d.default_value = list" since (a) list(key) doesn't return an empty list, and (b) it also doesn't insert the key into the dict; attempting to assign a callback function to default_factory that solves these issues fail because the callback doesn't have access to the dict instance (unless there's only one).

Do away with on_missing() and just include its body at the end of getitem(), to be invoked when the key isn't found.

This is less general in case you want different default semantics (e.g. not inserting the default, or making the default a function of the key) -- you'd have to override getitem() for that, which means you'd be paying overhead even for keys that are present.

I'll try to cook up an implementation on SF after I've dug myself out of the day's email barrage.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Previous message: [Python-Dev] Proposal: defaultdict
Next message: [Python-Dev] Proposal: defaultdict
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list