[Cython] Cython syntax to pre-allocate lists for performance

Robert Bradshaw robertwb at gmail.com
Thu Mar 7 19:44:48 CET 2013


On Thu, Mar 7, 2013 at 10:26 AM, Yury V. Zaytsev <yury at shurup.com> wrote:
> Hi Stefan,
>
> On Thu, 2013-03-07 at 12:21 +0100, Stefan Behnel wrote:
>
>> Note that Python has an algorithm for shrinking a list on appending, so
>> this might not be sufficient for your use case. What do you need it for?
>
> W00t! I didn't know about that.
>
> I'm wrapping a C++ code that should transmit large lists of objects to
> Python, while these objects are stored into something vector-like, which
> shouldn't get exposed directly. In the past, they did something like
>
>     obj = PyList_New(a.size());
>     for (a.begin(); a.end(); ++a) PyList_SetItem(obj, ...)
>
> I figured I can translate it into a while loop like
>
>     obj = []
>     while (it != a.end()):
>         obj.append(it)
>         inc(it)
>
> but then I'm not using the information about the size of a that I
> already have, and for huge lists this tends to be quite slow.
>
> I think this must be quite a common use case for bindings...
>
>> And why won't [None]*N help you out? It should be pretty cheap.
>
> It probably will, at least a bit. It just stroke me that if I'm going to
> do something along the lines of
>
>     idx = 0
>     obj = [None] * a.size()
>     while (it != a.end()):
>         obj[idx] = it
>         idx += 1
>         inc(it)
>
> I could also squeeze the last bits of performance by avoiding the
> creation of Nones and subsequently populating the list with them.
>
> If you say I have to use PyList_New directly, oh well... It's just that
> now since I'm rewriting the bindings in Cython anyways, I'm also trying
> to avoid using Python C API directly as much as possible.

I would time the two approaches to see if it really matters.

>> Won't list comprehensions work for you? They could potentially be adapted
>> to presize the list.
>
> I guess not.

    [o for o in a]

is nice and clean. If a has a size method (a common stl pattern), we
could optimistically call that to do the pre-allocation.

I don't know exactly what your usecase is, but you might consider
simply exposing a list-like wrapper supporting __getitem__ and
iteration, rather than eagerly converting the entire thing to a list.

- Robert


More information about the cython-devel mailing list