Performance of lists vs. list comprehensions

Wolfram Hinderer wolfram.hinderer at googlemail.com
Tue Jan 19 16:15:14 EST 2010


On 19 Jan., 21:06, Gerald Britton <gerald.brit... at gmail.com> wrote:
> [snip]
>
>
>
> > Yes, list building from a generator expression *is* expensive. And
> > join has to do it, because it has to iterate twice over the iterable
> > passed in: once for calculating the memory needed for the joined
> > string, and once more to actually do the join (this is implementation
> > dependent, of course). If the iterable is a list already, the list
> > building is not needed.
>
> if join has to iterate twice over the iterable, how does this work?
>
> $ python3.1
> Python 3.1.1+ (r311:74480, Nov  2 2009, 14:49:22)
> [GCC 4.4.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> l = map(str, (x for x in range(10) if int(x)%2))
> >>> '.'.join(l)
> '1.3.5.7.9'
>
> If join had to iterate twice over l, it would be consumed on the first
> pass.

Yes. (Coincidentally, l is consumed in the first execution of the Timer
()-statement, which is why I had to add the call to list. Not to
mention the xrange example of Stephen. But all this is not linked to
what join does internally.)

> If it works as you say then join would have to copy the
> iterable on the first pass, effectively turning it into a list.

Yes, that's what I'm saying above in the first two lines of what you
quoted.
I should have added somehting like "AFAIK CPython does it that way".

> Though I didn't read through it, I would suppose that join could use a
> dynamic-table approach to hold the result, starting with some
> guesstimate then expanding the result buffer if and when needed.

Probably. But that's not what happens. Try
"".join("" for x in range(10**10)
and watch join eating memory.





More information about the Python-list mailing list