Issue 13121: collections.Counter's += copies the entire object (original) (raw)

Created on 2011-10-07 08:50 by lars, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
cpython-counter-iadd.diff	lars,2011-10-07 08:50	Patch that adds __iadd__ to collections.Counter	review
counter.diff	rhettinger,2011-10-19 05:06	Patch for in-place multiset operations	review

Messages (5)
msg145063 - (view)	Author: Lars (lars)	Date: 2011-10-07 08:50
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows: count_total = Counter() for doc in documents: count_current = Counter(analyze(doc)) count_total += count_current count_per_doc.append(count_current) Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.) [1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
msg145065 - (view)	Author: Petri Lehtinen (petri.lehtinen) *	Date: 2011-10-07 08:58
This is slightly backwards incompatible, as some people may depend on the old behavior. Otherwise sounds like a good idea to me.
msg145067 - (view)	Author: Lars (lars)	Date: 2011-10-07 09:17
If this is not implemented because it is backwards incompat, then it might be useful to add a note to update's docstring explaining that it is much more efficient than +=. I was very surprised that it took minutes to add a few thousand moderate-sized Counters.
msg145071 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2011-10-07 11:49
I'll add the in-place methods including __iadd__, __isub__, __iand__, and __ior__. If speed is your issue, you should continue to use the update() method which will always be faster because it doesn't have a step to strip zeros and negative values from the existing Counter. Also, update() has a fast-path written in C.
msg145959 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-10-19 20:40
New changeset 5cced40374df by Raymond Hettinger in branch 'default': Issue #13121: Support in-place math operators for collections.Counter(). http://hg.python.org/cpython/rev/5cced40374df

History
Date	User	Action	Args
2022-04-11 14:57:22	admin	set	github: 57330
2011-10-19 20:41:20	rhettinger	set	status: open -> closedresolution: fixed
2011-10-19 20:40:49	python-dev	set	nosy: + python-devmessages: +
2011-10-19 05:06:59	rhettinger	set	files: + counter.diff
2011-10-07 11:49:28	rhettinger	set	priority: normal -> lowmessages: +
2011-10-07 09:17:07	lars	set	messages: +
2011-10-07 09:11:35	mark.dickinson	set	type: behavior -> enhancement
2011-10-07 09:11:18	mark.dickinson	set	assignee: rhettinger
2011-10-07 08:58:13	petri.lehtinen	set	versions: + Python 3.3, - Python 3.4nosy: + rhettinger, petri.lehtinenmessages: + stage: patch review
2011-10-07 08:50:33	lars	create