[Python-ideas] Length hinting and preallocation for container types

Eli Bendersky eliben at gmail.com
Tue Mar 5 18:15:37 CET 2013


On Tue, Mar 5, 2013 at 9:05 AM, Christian Heimes <christian at python.org>wrote:

> Hello,
>
> today I came across this slides
> https://speakerdeck.com/alex/why-python-ruby-and-javascript-are-slow by
> Alex Gaynor. The slides aren't some random rants on Python. Alex makes
> some valid points. Coincidentally  he is a PyPy developer, too. ;)
>
> One of his assertions is about memory (re)allocation. C is faster in
> some cases because C code usually does fewer allocations and
> reallocations. Python has no API to easily reallocate a list with 100
> items. Code like
>
>   lst = []
>   for i in range(100):
>       lst.append(i*i)
>
> has to resize the list multiple times. PyPy has a feature to create a
> preallocated list. https://bitbucket.org/pypy/pypy/commits/2ff5e3c765ef/
>
> Internally CPython already distinguishes between the length of object
> and the allocation size of an object for some types like list, dict and
> bytearray. For example PyListObject has `ob_size` for __len__ and
> `allocated` for the amount of available `ob_item` slots.
>
> I suggest that we add two new functions to container types like list,
> bytearray and dict. obj.__preallocate__(size) increases the internal
> buffer by size elements. obj.__shrink__() dwindles the internal buffer
> to len(obj) elements, maybe a bit more.
>
> A new context manager aids users with preallocation and shrinking:
>
> class LengthHint:
>     def __init__(self, container, hint):
>         self.container = container
>         self.hint = hint
>         self.instance = None
>
>     def __enter__(self):
>         self.instance = self.container()
>         self.instance.__preallocate__(self.hint)
>         return self.instance
>
>     def __exit__(self, exc_type, exc_val, exc_tb):
>         self.instance.__shrink__()
>
>
> with LengthHint(list, 200) as lst:
>     # lst has 200 ob_item slots but len(lst) == 0
>     for i in range(100):
>         lst.append(i*i)
> # __exit__ shrinks ob_item to 100
>

The real problem is that this code is not idiomatic Python, especially if
you want it to be reasonably fast:

  lst = []
  for i in range(100):
      lst.append(i*i)

Why not:

lst = [i*i for i in range(100)]

If the "append" pattern is complex, just "preallocate" like this:

lst = [0] * 100

And then fill it.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130305/23419766/attachment.html>


More information about the Python-ideas mailing list