Performance of lists vs. list comprehensions

Arnaud Delobelle arnodel at googlemail.com
Tue Jan 19 16:01:31 EST 2010


Gerald Britton <gerald.britton at gmail.com> writes:

> [snip]
>
>>
>> Yes, list building from a generator expression *is* expensive. And
>> join has to do it, because it has to iterate twice over the iterable
>> passed in: once for calculating the memory needed for the joined
>> string, and once more to actually do the join (this is implementation
>> dependent, of course). If the iterable is a list already, the list
>> building is not needed.
>
> if join has to iterate twice over the iterable, how does this work?
>
> $ python3.1
> Python 3.1.1+ (r311:74480, Nov  2 2009, 14:49:22)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> l = map(str, (x for x in range(10) if int(x)%2))
>>>> '.'.join(l)
> '1.3.5.7.9'
>>>>
>
> If join had to iterate twice over l, it would be consumed on the first
> pass.  If it works as you say then join would have to copy the
> iterable on the first pass, effectively turning it into a list.
> Though I didn't read through it, I would suppose that join could use a
> dynamic-table approach to hold the result, starting with some
> guesstimate then expanding the result buffer if and when needed.

Looking at the source (py3k):

PyObject *
PyUnicode_Join(PyObject *separator, PyObject *seq)
{
    [skip declarations]

    fseq = PySequence_Fast(seq, "");
    if (fseq == NULL) {
        return NULL;
    }

    [code that works out the length of the joined string then allocates
     memory, then fills it]
}

Where PySequence_Fast(seq, "") returns seq if seq is already a tuple or
a list and otherwise returns a new tuple built from the elements of seq.

So no, it doesn't guess the size of the joined string and yes, it
iterates twice over the "sequence" (I would have thought it should be
called an iterable) by copying it into a tuple.

-- 
Arnaud



More information about the Python-list mailing list