Performance of lists vs. list comprehensions

Tue Jan 19 23:25:22 EST 2010

* Steven D'Aprano:
> On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote:
> 
>> That's surprising. I wouldn't implement it that way at all.  I'd use a
>> dynamically-expanding buffer as I suggested.  That way you get a single
>> pass and don't have to calculate anything before you begin.  In the best
>> case, you'd use half the memory (the result just fits in the buffer
>> after its last expansion and no saved tuple).  In the worst case, the
>> memory use is about the same (you just expanded the buffer using a 2x
>> expansion rule then you hit the last item).
> 
> In the worst case, you waste 50% of the memory allocated.

Yes. That is a good argument for not doing the expanding buffer thing. But such 
buffers may be generally present anyway, resulting from optimization of "+".

Using CPython 2.6.4 in Windows XP:

 >>> def elapsed_time_for( f, n_calls ):
...     return timeit.timeit( f, number = n_calls )
...
 >>> def appender( n ):
...     def makestr( n = n ):
...         s = ""
...         for i in xrange( n ):
...             s = s + "A"
...         return s
...     return makestr
...
 >>> appender( 1000 )() == 1000*"A"
True
 >>>
 >>> for i in xrange( 10 ):
...     print( elapsed_time_for( appender( 10000*(i+1) ), 100 ) )
...
0.782596670811
1.37728454314
2.10189898437
2.76442173517
3.34536707878
4.08251830889
4.79620119317
5.42201844089
6.12892811796
6.84236460221
 >>> _

Here the linear increase of times indicate that the "+" is being optimized using 
an expanding buffer for the string. If only the minimal space was allocated each 
time then one would expect O(n^2) behavior instead of the apparently O(n) above. 
Example of that O(n^2) behavior given below.

 >>> def exact_appender( n ):
...     def makestr( n = n ):
...         s = ""
...         for i in xrange( n ):
...             new_s = s + "A"
...             s = new_s
...         return s
...     return makestr
...
 >>> exact_appender( 1000 )() == 1000*"A"
True
 >>> for i in xrange( 10 ):
...     print( elapsed_time_for( exact_appender( 10000*(i+1) ), 100 ) )
...
3.28094241027
9.30584501661
19.5319170453
33.6563767183
52.3327800042
66.5475022663
84.8809736992
Traceback (most recent call last):
   File "<stdin>", line 2, in <module>
   File "<stdin>", line 2, in elapsed_time_for
   File "C:\Program Files\cpython\python26\lib\timeit.py", line 227, in timeit
     return Timer(stmt, setup, timer).timeit(number)
   File "C:\Program Files\cpython\python26\lib\timeit.py", line 193, in timeit
     timing = self.inner(it, self.timer)
   File "C:\Program Files\cpython\python26\lib\timeit.py", line 99, in inner
     _func()
   File "<stdin>", line 5, in makestr
KeyboardInterrupt
 >>> _

So, given that apparently the simple '+' in the first example is optimized using 
an expanding buffer, which then hangs around, it's not clear to me that the 
space optimization in 'join' really helps. It may be (but isn't necessarily) 
like shoveling snow in a snowstorm. Then the effort/cost could be for naught.

> And because 
> strings are immutable (unlike lists and dicts, which also use this 
> approach), you can never use that memory until the string is garbage 
> collected.

I think that the simple '+', with the apparent optimization shown in the first 
example above, can use that space. I know for a fact that when you control a 
string implementation then it can do that (since I've implemented that). But I 
don't know for a fact that it's practical to do so in Python. In order to use 
the space the implementation must know that there's only one reference to the 
string. And I don't know whether that information is readily available in a 
CPython implementation (say), although I suspect that it is.

Cheers,

- Alf