Issue 13121: collections.Counter's += copies the entire object (original) (raw)

Created on 2011-10-07 08:50 by lars, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cpython-counter-iadd.diff lars,2011-10-07 08:50 Patch that adds __iadd__ to collections.Counter review
counter.diff rhettinger,2011-10-19 05:06 Patch for in-place multiset operations review
Messages (5)
msg145063 - (view) Author: Lars (lars) Date: 2011-10-07 08:50
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows: count_total = Counter() for doc in documents: count_current = Counter(analyze(doc)) count_total += count_current count_per_doc.append(count_current) Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.) [1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
msg145065 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-10-07 08:58
This is slightly backwards incompatible, as some people may depend on the old behavior. Otherwise sounds like a good idea to me.
msg145067 - (view) Author: Lars (lars) Date: 2011-10-07 09:17
If this is not implemented because it is backwards incompat, then it might be useful to add a note to update's docstring explaining that it is much more efficient than +=. I was very surprised that it took *minutes* to add a few thousand moderate-sized Counters.
msg145071 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-10-07 11:49
I'll add the in-place methods including __iadd__, __isub__, __iand__, and __ior__. If speed is your issue, you should continue to use the update() method which will always be faster because it doesn't have a step to strip zeros and negative values from the existing Counter. Also, update() has a fast-path written in C.
msg145959 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-19 20:40
New changeset 5cced40374df by Raymond Hettinger in branch 'default': Issue #13121: Support in-place math operators for collections.Counter(). http://hg.python.org/cpython/rev/5cced40374df
History
Date User Action Args
2022-04-11 14:57:22 admin set github: 57330
2011-10-19 20:41:20 rhettinger set status: open -> closedresolution: fixed
2011-10-19 20:40:49 python-dev set nosy: + python-devmessages: +
2011-10-19 05:06:59 rhettinger set files: + counter.diff
2011-10-07 11:49:28 rhettinger set priority: normal -> lowmessages: +
2011-10-07 09:17:07 lars set messages: +
2011-10-07 09:11:35 mark.dickinson set type: behavior -> enhancement
2011-10-07 09:11:18 mark.dickinson set assignee: rhettinger
2011-10-07 08:58:13 petri.lehtinen set versions: + Python 3.3, - Python 3.4nosy: + rhettinger, petri.lehtinenmessages: + stage: patch review
2011-10-07 08:50:33 lars create