[Python-Dev] counterintuitive behavior (bug?) in Counter with += (original) (raw)

Lars Buitinck L.J.Buitinck at uva.nl
Mon Oct 3 12:12:47 CEST 2011


Hello,

[First off, I'm not a member of this list, so please Cc: me in a reply!]

I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:

count_total = Counter()
for doc in documents:
    count_current = Counter(analyze(doc))
    count_total += count_current
    count_per_doc.append(count_current)

Because we target Python 2.5+, I implemented a lightweight replacement with just the functionality we need, including iadd, but then my co-developer ran the above code on Python 2.7 and performance was horrible. After some digging, I found out that Counter [2] does not have iadd and += copies the entire left-hand side in add!

I also figured out that I should use the update method instead, which I will, but I still find that uglier than +=. I would submit a patch to implement iadd, but I first want to know if that's considered the right behavior, since it changes the semantics of +=:

>>> from collections import Counter
>>> a = Counter([1,2,3])
>>> b = a
>>> a += Counter([3,4,5])
>>> a is b
False

would become

# snip
>>> a is b
True

TIA, Lars

[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af [2] http://hg.python.org/cpython/file/tip/Lib/collections/init.py#l399

-- Lars Buitinck Scientific programmer, ILPS University of Amsterdam



More information about the Python-Dev mailing list