[New-bugs-announce] [issue13121] collections.Counter's += copies the entire object

Lars Buitinck report at bugs.python.org
Fri Oct 7 10:50:34 CEST 2011


New submission from Lars Buitinck <L.J.Buitinck at uva.nl>:

I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:

   count_total = Counter()
   for doc in documents:
       count_current = Counter(analyze(doc))
       count_total += count_current
       count_per_doc.append(count_current)

Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)

[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af

----------
components: Library (Lib)
files: cpython-counter-iadd.diff
keywords: patch
messages: 145063
nosy: larsmans
priority: normal
severity: normal
status: open
title: collections.Counter's += copies the entire object
type: behavior
versions: Python 3.4
Added file: http://bugs.python.org/file23336/cpython-counter-iadd.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13121>
_______________________________________


More information about the New-bugs-announce mailing list