Performance of list comprehensions vs. map

Wed Sep 5 20:16:34 EDT 2001

Skip Montanaro wrote:

> map() is a single function that doesn't return control to the interpreter
> until it's work is done.  List comprehensions are essentially just syntactic
> sugar for the equivalent for loop.

This is exactly my question: why is it implimented this way?

Because it was easy: A fine answer

or

Because it is important for hte design that it work exactly that the
same way: which I don't understand at all.

> In theory, an optimizer could identify and replace some list comprehensions
> with calls to map,

This is what I have in mind.

> but they wouldn't be quite equivalent.  In the examples
> above, there is no guarantee that the definition of the str function doesn't
> change during the execution of either the for loop or the list comprehension
> (think multiple threads of execution ;-).  The call to map, on the other
> hand, binds str once before map is called.

Again, is this a design decision? that the expression in a list
comprehension shuld be able to be changes mid-comprehension? In another
thread to boot? This sounds like a the worst of confusing and
unpredictable behaviour to me!!!

If it's OK from a design standpoint to require that the function in a
map is bound once, it seems pretty reasonable to have the same
requirement for a list comprehension. In fact, it seems like a darn good
idea! The fact that the current implimentation doesn't enforce it is a
pretty poor design justification...or am I missing something?

> But there is no guarantee that the list you are operating on isn't modified
> by another thread during execution.

but there should be !!

> It's not a matter of lack of interest as much as a desire to keep the
> semantics of Python and lack of time.

The lack of time I can certainly understand. As for the symantics, I
think my proposal would make very little difference to the symantics,
except, perhaps, for allowing lists to be altered in the middle of a
list comprehension...which is pretty scary to me anyway. In fact, it
seems there has been recent discussion about removing some of the
extreme dynamicism from python , in the form of being able to re-bind
the __dict__ of a class, when...OK I'll admit it, I couldn't follow the
discussion! Even if you did want to continue to allow this kind of
extreme dynamicism, you could simply allow people to turn off the
hommogenous flag for a sequence: bingo! you can change it in another
thread to your hearts content!

My point is that I imagine most sequences used in Python are, in fact,
homogenous, but there is no way to know this, and it could make a huge
difference in performance. Look at NumPy. I simply could not use Python
without NumPy.

> Here are some efforts I'm aware of.
> Note that there are lots of ways performance can be addressed, not just by
> constraining types.

Certainly. And when any kind of type constraining is suggested, what a
lot of people say is that what we REALLY need is an all-powerful type
infering engine...talk about time to impliment issues! I think one
problem Python develpment does suffer from is that nothing gets done
that isn't reasonably doable with a fairly small amount of work, but
some things are simply too big to get done: like a type infering engine.

>     * Rattlesnake (my thing) - a new register-oriented virtual machine whose
>       premise is that the current stack-oriented virtual machine does far
>       too much data shuffling

sounds cool

>     * Psyco (Armin Rego) - a run-time specializing virtual machine that sees
>       what sorts of types are input to a function and compiles a type- or
>       value-specific version of that function on-the-fly.  I believe Armin
>       is looking at some JIT code generators in addition to or instead of
>       another virtual machine.

wouldn't homogenouos sequences be a big help for this? 

>     * Static typing SIG (http://www.python.org/sigs/types-sig/) - I'm not
>       familiar with this SIGs work, so you'll have to browse the archives.

This is a pretty quiet SIG these days. I know a while ago Guido said
that there would probably be some kin dof static typing in Py3k, but
mostly for compile time type checking, not performance. I hanv't heard
much since then, and it looks to be further off. Personally, I think a
little static type declarations could help Py2C and the like a lot.

>     * Python2C (Bill Tutt, Greg Stein) - A "module compiler" that generates
>       a C version of a Python module.  http://www.mudlib.org/~rassilon/p2c/

I've been keeping my eye on this one, but it seems to be sleeping at the
moment... someone please correct me if I'm wrong. Homogenouos sequences
would be very handy here as well.

>     * Various peephole optimizers - I wrote one a few years ago.  There's
>       also Michael Hudson's bytecodehacks project (on Sourceforge).  The
>       xfreeze bytecode freezer (part of the standard distribution?) also
>       does some peepholing.

Hmm, I have no idea what peepholing is.. I'll have to check that out
when I get the time.

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------