[Python-ideas] [Python-Dev] hello, new dict addition for new eve ? (original) (raw)

goodman.m.w at gmail.com goodman.m.w at gmail.com
Mon Jan 2 23:49:16 CET 2012


It's funny, I joined this list several days ago to ask about this very problem. Perhaps I can add something (there's code below if you want to skip the discussion):

I do research in computational linguistics, and adding/incrementing dictionary values is a very common task (e.g. counting unique words in a corpus). A defaultdict and a for loop have served me well toward this end (and Counter may be nice, but I haven't tried it), but I've always wanted something more functional. I found in Haskell a nice method Map.fromListWith that takes a list of (key, value) pairs and a function describing how to deal with collisions. It's useful and flexible.

Summing up what has been said, as I understand it, it's not obvious what a good behaviour for + is with dicts, and we can't assure it follows algebraic properties like commutativity. As Guido hinted at in the first messages, addition over disjoint dicts (and we can include the identity dict) should be the same no matter the operation. That is: {k1:v1} + {k2:v2} = {k1:v1, k2:v2} {k1:v1} + {} = {k1:v1}

The only interesting cases are when we have the same key: {k1:v1} + {k1:v2} = {k1:op(v1,v2)} where op is some binary function.

While dict doesn't have add defined, it does resolve conflicts by just using the latter value (i.e. op=lambda x, y: y), as in the following three statements: d = dict([(k1,v1), (k1,v2)]) # result: {k1:v2} d[k1] = v3 # result: {k1:v3} d.update([(k1,v4) # result: {k1:v4}

I think the latter two, at least, should not be counted as addition, but rather stay as assignment. Someone brought up the idea of a collision method that is called when setitem is called on an existing key. I like this idea, and there's probably some use for it, but for implementing addition it might not be practical (how, then, would you re-assign a key's value without forcing "op" to apply on the new and existing value?).

I propose something like the following proof-of-concept:

class AccumulationDict(dict): def init(self, accumulator, *args, **kwargs): if not callable(accumulator): raise TypeError('Accumulator must be callable.') self.accumulator = accumulator self.accumulate(*args, **kwargs)

def __additem__(self, key, value):
    if key in self:
        self[key] = self.accumulator(self[key], value)
    else:
        self[key] = value

def __add__(self, other):
    result = AccumulationDict(self.accumulator, self) # check if

other's accumulator is same? result.accumulate(other) return result

def accumulate(self, *args, **kwargs):
    for arg in args:
        if isinstance(arg, list):
            for (key, value) in arg:
                self.__additem__(key, value)
        elif isinstance(arg, dict):
            for (key, value) in arg.items():
                self.__additem__(key, value)
        else:
            raise TypeError('Argument must be of type list or dict.')
    for key in kwargs:
        self.__additem__(key, kwargs[key])

Which has the following properties:

d = AccumulationDict(operator.add, [('a',1),('b',1),('a',1)], b=1) d {'a': 2, 'b': 2} d['a'] = 0 d {'a': 0, 'b': 2} d + {'a':3,'b':1} {'a': 3, 'b': 3} d {'a': 0, 'b': 2} d['c'] Traceback (most recent call last): File "", line 1, in KeyError: 'c' d.update({'c':1}) d {'a': 0, 'c': 1, 'b': 2} d.accumulate({'c':1}) d {'a': 0, 'c': 2, 'b': 2}

I hope this contributes to the discussion, and I'd be happy to hear your thoughts.

Thanks,

-- -Michael Wayne Goodman



More information about the Python-ideas mailing list