Performance of lists vs. list comprehensions

Tue Jan 19 11:10:43 EST 2010

Thanks!  Good explanation.

On Tue, Jan 19, 2010 at 10:57 AM, Alf P. Steinbach <alfps at start.no> wrote:
> * Gerald Britton:
>>
>> Yesterday I stumbled across some old code in a project I was working
>> on.  It does something like this:
>>
>> mystring = '\n'.join( [ line for line in lines if <some conditions
>> depending on line> ] )
>>
>> where "lines" is a simple list of strings.  I realized that the code
>> had been written before you could put a list comprehension as an
>> argument to join().  I figured that I could drop the '[' and ']' and
>> leave the rest as a list comprehension since that works with current
>> Python (and the project's base is 2.5).  So I rewrote the original
>> statement like this:
>>
>> mystring = '\n'.join( line for line in lines if <some conditions
>> depending on line> )
>>
>> It works as expected.  Then I got curious as to how it performs.  I
>> was surprised to learn that the rewritten expression runs more than
>> twice as _slow_.  e.g.:
>>
>>>>> l
>>
>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>
>>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
>>
>> 2.9967339038848877
>>
>>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
>>
>> 7.2045478820800781
>>
>> Notice that I dropped the condition testing that was in my original
>> code.  I just wanted to see the effect of two different expressions.
>>
>> I thought that maybe there was some lower bound on the number of the
>> items in the list or list comprehension beyond which the comprehension
>> would prove more efficient.  There doesn't appear to be one.  I scaled
>> the length of the input list up to 1 million items and got more or
>> less the same relative performance.
>>
>> Now I'm really curious and I'd like to know:
>>
>> 1. Can anyone else confirm this observation?
>
>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
> 5.8625191190500345
>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
> 12.093135300715574
>>>> _
>
>
>> 2. Why should the "pure" list comprehension be slower than the same
>> comprehension enclosed in '[...]' ?
>
> Regarding (2) the unparenthesized expression in join is *not* a list
> comprehension but a generator expression.
>
> And as such it involves join calling next() on the generator object
> repeatedly, with each next() call involving a light-weight context shift.
>
> In addition the docs mumble something about "lazy" evaluation, and that may
> also contribute to the overhead.
>
> I think that in contrast, the interpreter can evaluate a list comprehension,
> [x for x in blah], directly without any context shifting, just by
> transforming it to equivalent code and putting the target expressions
> innermost there.
>
> And so the main factor causing a slowdown for a list comprehension would, I
> think, be paging and such if the list it produced was Really Huge, while for
> the generator there's no memory issue but rather much calling & context
> shifting.
>
>
> Cheers & hth.,
>
> - Alf
> --
> http://mail.python.org/mailman/listinfo/python-list
>

-- 
Gerald Britton