Performance of list comprehensions vs. map

Wed Sep 5 17:29:45 EDT 2001

Chris,
you wrote:
> 
> Hi all,
> 
> I just took another look at:
> 
> http://musi-cal.mojam.com/~skip/python/fastpython.html
> 
> specifically, the loops section:
> 
> """
> Loops
> 
> Python supports a couple of looping constructs. The for statement is
> most commonly used. It loops over the elements of a sequence, assigning
> each to the loop
> variable. If the body of your loop is simple, the interpreter overhead
> of the for loop itself can be a substantial amount of the overhead. This
> is where the map
> function is handy. You can think of map as a for moved into C code. The
> only restriction is that the "loop body" of map must be a function call.
> 
> Here's a straightforward example. Instead of looping over a list of
> words and converting them to upper case:
> 
> newlist = []
> for word in list:
>     newlist.append(word.upper())
> 
> you can use map to push the loop from the interpreter into compiled C
> code:
> 
> import string
> newlist = map(string.upper, list)
> 
> List comprehensions were added to Python in version 2.0 as well. They
> provide a syntactically more compact way of writing the above for loop:
> 
> newlist = [s.upper() for s in list]
> 
> It's not really any faster than the for loop version, however.
> """
> 

For larger lists you might find that using append in the loop takes 
quadratic time and that the list comprehension allows the new list to be constructed in linear time.

> my question is: Why not? I like list comprehensions a lot, and they
> would seem to offer a way to get optimization over a for loop, much like
> map does, but apparently not. If map can do the loop in C, why can't a
> list comprehension?

Pass.

> In general, I like list comprehensions because they make Python a more
> "sequence oriented" language: When programming, it is very common to
> store a sequence of data of some type, and more often than not, do
> something with that whole sequence, rather than individual elements of
> that sequence. All that looping just to do the same thing to all the
> elements really clutters up the logic of the code. map and list
> comprehensions really help (as will element-wise operators, if we ever
> get them). However, it seems to me that sequence-oriented structures
> could really provide a performance benifit as well, but Python has not
> gotten there yet. I think a big stumbling block is that Python has no
> concept of the "homogenous" sequence. There are modules that provide
> such a  thing (Numeric, array), and indeed, the string type is a
> homogenous sequence, but Python itself does not understand this concept,
> so that if a map or list comprehesion is using a homogenous list, it
> still has to check the type of every element as it goes through the
> list. For example:
> 
> [2*a for a in list]
> 
> If the interpreter could just check once what the type of all the
> elements of list were, it would only have to figure out what 2*a meant
> once, and it could whip through the list very quickly. Indeed, if you
> do:
> 
> [2*a for a in a_string]
> 
> The string is a homogenous sequence by definition (once you check that
> it is a string), but the interpreter has no understanding of this
> concept.
> 
> NOTE: I fully understand that adding a concept like this allows for a
> whole set of optimizations that COULD be done, but someone would have to
> get around to doing it, which would be a whole lot of work, and might
> never happen. That's not a reason not to do it. First of all, the
> framework has to be in place for this kind of optimization before anyone
> can even start writing the code, and even if there are no optimizations
> ever built into the interpreter, the concept of a homogenous sequence
> would be very useful for extension writers. A number of times I have
> written an extension that needed a list if which all the elements where
> the same type. I had to put a type check inside my loop, which clutters
> up the code, and slows things down some. If I'm working with numvers, I
> use NumPy, but that's not always possible.
> 
> As for what might get added, I'm imagining that it should be possible to
> automatically maintain a "homogeous" flag on a sequence: for an
> imputable sequence it would get set when it was built. For a mutable
> sequence, each time an element is added, a type check could be done, and
> the homogenous flag turned off if the type didn't match. (turning it
> back on when that item was removed would probably not be worth it).
> Ideally, there would be a way for the programmer to define what was
> meant by homogenous (i.e.: a NumPy array of doubles, or any Numpy array)
> This would be trickier, but when in doubt, the homogenous flag would
> just be turned off, and we'd be back where we started.
> 
> Even if no changes were made to the list type, the concept could be put
> in place, and then existing homogenous sequences could be used more
> optimally.

This might work for sequences containing standard types. Instances of user
classes, however, may be changed while executing the loop over the sequence.
That means that the method that would be determined once to be executed
on all elements would not be the correct one for all elements accessed in
the loop.

Have fun,
Ype