Performance of list comprehensions vs. map

Thu Sep 6 07:38:07 EDT 2001

"Paul Winkler" <slinkp23 at yahoo.com> wrote in message
news:slrn9pct1n.2c5.slinkp23 at roaddog.armsnet...
    ...
> >List comprehensions were added to Python in version 2.0 as well. They
> >provide a syntactically more compact way of writing the above for loop:
> >
> >newlist = [s.upper() for s in list]
> >
> >It's not really any faster than the for loop version, however.
> >"""
>
> Hmm... I just played around a bit more and it is in fact a bit faster,
> but not much; as the size of the list grows, the time to do a list
> comprehension vs. the time to do a for loop seems to converge, while
> the time to do it via map is always fastest, sometimes by a lot.

I think one key issue is: map requires its sequence arguments to
support len(): it pre-allocates the needed list to the longest
len() out of those of its arguments (it can lengthen it later if
the len() were lying, I believe, but it gets its efficiency from
the fact it doesn't normally need to).  For loops, and also list
comprehensions, do not require len() support -- they build the
resulting list one append at a time.  This makes them more
flexible, but there's a price to pay for this.  Plus, of course,
map's loop is a tight C one, and the others are looping in
interpreted Python bytecode.

The effects appear to be of comparable magnitude -- for ex,
consider the little test program pertes.py:

class Evens:
    def __init__(self, n, claim=None):
        self.n = n
        self.len = claim
        if self.len is None: self.len = n
    def __len__(self):
        return self.len
    def __getitem__(self, index):
        if index>=self.n: raise IndexError, index
        return index*2

def funz(x): 23*x+17

def list_comp(n, claim=None, f=funz):
    seq = Evens(n, claim)
    return [f(x) for x in seq]

def for_loop(n, claim=None, f=funz):
    seq = Evens(n, claim)
    result = []
    for x in seq: result.append(f(x))
    return result

def use_map(n, claim=None, f=funz):
    seq = Evens(n, claim)
    return map(f, seq)

def pre_alloc(n, claim=None, f=funz):
    seq = Evens(n, claim)
    result = n*[0]
    i = 0
    for x in seq: result[i]=f(x); i+=1
    return result

import time

if __name__=='__main__':
    for f in list_comp,for_loop,use_map,pre_alloc:
        print "%s: "%f.__name__,
        n = 100*1000
        start = time.clock()
        result = f(n, n)
        stend = time.clock()
        print "%.2f"%(stend-start),
        start = time.clock()
        result = f(n, 1)
        stend = time.clock()
        print "%.2f"%(stend-start),
        print

On my old box, pretty repeatably:

D:\>python pertes.py
list_comp:  2.68 2.63
for_loop:  2.88 2.86
use_map:  1.49 2.05
pre_alloc:  1.79 1.77

D:\>python pertes.py
list_comp:  2.67 2.63
for_loop:  2.88 2.84
use_map:  1.49 2.05
pre_alloc:  1.79 1.77

I.e., pre-allocation (looping in Python) can't beat
map (looping in C) when the latter also pre-allocs,
but when map's own pre-allocation is defeated (as
the sequence is instructed to lie about its length)
then pre_alloc wins.  map is the only one of these
four solutions for which the correctness of seq's
length matters -- and it matters to the tune of
30% or thereabouts.  If a list comprehension was
also able to pre-allocate, it might be sped up by
a similar amount -- not all the way to map's speeds
in this case, but maybe to below 2 seconds in this
example.  That would require special-casing, with
comparison to the currently produced bytecode:

D:\>python -O
Python 2.1.1c1 (#19, Jul 13 2001, 00:25:06) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
Alternative ReadLine 1.4 -- Copyright 2001, Chris Gonnerman
>>> import dis
>>> import pertes
>>> dis.dis(pertes.list_comp)
          0 LOAD_GLOBAL              0 (Evens)
          3 LOAD_FAST                0 (n)
          6 LOAD_FAST                1 (claim)
          9 CALL_FUNCTION            2
         12 STORE_FAST               4 (seq)
         15 BUILD_LIST               0
         18 DUP_TOP
         19 LOAD_ATTR                4 (append)
         22 STORE_FAST               5 (_[1])
         25 LOAD_FAST                4 (seq)
         28 LOAD_CONST               1 (0)
    >>   31 FOR_LOOP                22 (to 56)
         34 STORE_FAST               3 (x)
         37 LOAD_FAST                5 (_[1])
         40 LOAD_FAST                2 (f)
         43 LOAD_FAST                3 (x)
         46 CALL_FUNCTION            1
         49 CALL_FUNCTION            1
         52 POP_TOP
         53 JUMP_ABSOLUTE           31
    >>   56 DELETE_FAST              5 (_[1])
         59 RETURN_VALUE
         60 LOAD_CONST               0 (None)
         63 RETURN_VALUE

Basically, in situation such as these where the list
comprehension has just one for, and no if, a call to
len() could be attempted on the sequence object in
the for, and, if that is satisfactory, the list could
be built with the indicated length, and instead of
append, a (currently non-existing) method indicating
"store at this index extending if necessary" could be
used.  The index would need to be tracked (FOR_LOOP
has it, internally, but doesn't make it available on
the outside -- another long-standing small wish!-),
AND the resulting list trimmed at the end if the
len() lied by eccess.

Looks like a tad too much trouble to me for a pretty
modest performance extra in a specific case -- the
extra ops to set things up and conditionally trim at
the end might also easily worsen the performance for
very small list comprehensions, I suspect.  Still,
those who disagree are of course welcome to try such
a patch and measure its performance effects:-).

Alex