Performance of lists vs. list comprehensions

Wed Jan 20 02:38:12 EST 2010

Steven D'Aprano, 20.01.2010 07:12:
> On Wed, 20 Jan 2010 05:25:22 +0100, Alf P. Steinbach wrote:
>> That is a good argument for not doing the expanding buffer thing.
>> But such buffers may be generally present anyway, resulting from
>> optimization of "+".
> 
> As near as I can determine, the CPython optimization you are referring to 
> doesn't use the "double the buffer when needed" technique. It operates on 
> a completely different strategy. As near as I can tell (as a non-C 
> speaker), it re-sizes the string in place to the size actually needed, 
> thus reducing the amount of copying needed.
> 
>> Using CPython 2.6.4 in Windows XP:
> [snip time measurements]
>> Here the linear increase of times indicate that the "+" is being
>> optimized using an expanding buffer for the string.
> 
> Be careful of jumping to the conclusion from timings that a certain 
> algorithm is used. All you can really tell is that, whatever the 
> algorithm is, it has such-and-such big-oh behaviour.

Which is particularly tricky here since the algorithms depends more on the
OS than on the code in CPython. The better timings come from the fact that
the OS does *not* need to copy the buffer on each iteration, but does
smarter things when asked to enlarge the buffer. If you ran the benchmark
on an OS that *did* copy the buffer each time, the runtime would really be
quadratic.

BTW, I think it would actually be worth trying to apply the same approach
to str.join() if the argument is not a sequence (obviously followed by a
benchmark on different platforms).

Stefan