[Python-Dev] collections module

Guido van Rossum guido at python.org
Mon Jan 12 16:43:56 EST 2004


> > Guess which of these two statements is faster and by how much:
> >
> >      list(itertools.repeat(None, 5000))
> >      list(tuple(itertools.repeat(None, 5000)))
> 
> I did guess, and was correct to 6 significant decimal digits <wink>.
> 
> > The answer is important because it points the way to faster list
> > comprehensions and other list building applications.
> 
> That Python's builtin list calls realloc() on every append is pretty
> gross, but it's also a tradeoff, saving the bytes in the list header
> that would be required to remember how many allocated-but-unused
> slots exist.  If you've got a million tiny lists (and some apps do),
> burning an additional 8MB for something your app doesn't really need
> isn't attractive (on a 32-bit box today, the list header is 16
> bytes; pymalloc aligns to 8-byte boundaries, so the next possible
> size is 24 bytes).  list's very small overallocation (in the
> ballpark of 5%) is also aimed at minimizing memory burden.  Speed
> isn't everything, tradeoffs are hard, and something loses when
> something else is favored.  (BTW, I never would have considered
> implementing a resizable list type *without* keeping a field to
> record the amount of overallocation, so this isn't the tradeoff I
> would have made <wink>.)

Just in case nobody had thought of it before, couldn't the realloc()
call be avoided when roundupsize(oldsize) == roundupsize(newsize)???

> > ...
> > And my interest is in a collections module which should probably
> > published elsewhere (too much negativity on the subject here).
> 
> A collections *package* is a fine idea.  I say "package" instead of
> "module" because there are literally hundreds of container types,
> and queues and bags are a tiny corner of that space.  An example
> dear to my heart is BTrees (I maintain Zope's BTree implementation),
> which is a wonderful way to get an associative mapping sorted by key
> value.  BTrees have a large API all by themselves, so it wouldn't
> make sense to try to squash them into a module along with 50 other
> container types.

Right.  +1 for a package.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list