Performance of list comprehensions vs. map
Skip Montanaro
skip at pobox.com
Wed Sep 5 15:21:27 EDT 2001
Chris> I just took another look at:
Chris> http://musi-cal.mojam.com/~skip/python/fastpython.html
Chris> specifically, the loops section:
fp.html> List comprehensions were added to Python in version 2.0 as
fp.html> well. They provide a syntactically more compact way of writing
fp.html> the above for loop:
fp.html> newlist = [s.upper() for s in list]
fp.html> It's not really any faster than the for loop version, however.
Chris> my question is: Why not?
map() is a single function that doesn't return control to the interpreter
until it's work is done. List comprehensions are essentially just syntactic
sugar for the equivalent for loop. Both the for loop and list comprehension
are implemented as loops using multiple virtual machine instructions.
Perhaps the easiest way to see this is to write three equivalent functions
and disassemble them:
def map_str(n):
return map(str, range(n))
def lc_str(n):
return [str(i) for i in range(n)]
def for_str(n):
l = []
append = l.append
for i in range(n):
append(str(i))
assert map_str(5) == lc_str(5) == for_str(5)
import dis
print "disassembling map_str"
dis.dis(map_str)
print "disassembling lc_str"
dis.dis(lc_str)
print "disassembling for_str"
dis.dis(for_str)
Chris> I like list comprehensions a lot, and they would seem to offer a
Chris> way to get optimization over a for loop, much like map does, but
Chris> apparently not. If map can do the loop in C, why can't a list
Chris> comprehension?
In theory, an optimizer could identify and replace some list comprehensions
with calls to map, but they wouldn't be quite equivalent. In the examples
above, there is no guarantee that the definition of the str function doesn't
change during the execution of either the for loop or the list comprehension
(think multiple threads of execution ;-). The call to map, on the other
hand, binds str once before map is called.
Chris> I think a big stumbling block is that Python has no concept of
Chris> the "homogenous" sequence. There are modules that provide such a
Chris> thing (Numeric, array), and indeed, the string type is a
Chris> homogenous sequence, but Python itself does not understand this
Chris> concept, so that if a map or list comprehesion is using a
Chris> homogenous list, it still has to check the type of every element
Chris> as it goes through the list. For example:
Chris> [2*a for a in list]
Chris> If the interpreter could just check once what the type of all the
Chris> elements of list were, it would only have to figure out what 2*a
Chris> meant once, and it could whip through the list very
Chris> quickly. Indeed, if you do:
Chris> [2*a for a in a_string]
But there is no guarantee that the list you are operating on isn't modified
by another thread during execution.
I've been thinking of this a little in the context of Ken Simpson's recently
announced PyInline module. The biggest reason I can't use it right now
(ignoring it's pre-alpha version number ;-) in my own code is precisely that
I operate on dicts, lists and tuples a lot. I'm sure that's true for other
Python programmers. The problem of easily writing C code that can operate
on lists (for example) boils down to deciding how you will map Python's
lists to something that is efficiently representable in C. Perl's inline
module does this with something called typemaps. (Look in something like
/usr/local/lib/perl5/5.6.1/ExtUtils for a file called "typemap".) I don't
really understand everything that's going on in that file, but it does
provide mappings. For instance, a "char **" argument followed by an
"unsigned int" arg might map to and from a list of strings and a length, an
"int *" followed by "unsigned int" might map to and from a list of ints,
etc. I'm not sure quite how you'd do dicts (or if you could in a reasonable
fashion without listifying them somehow).
I suspect this is about as close as Python will ever come to understanding
homogeneous containers at the language level.
Chris> I've proposed similar ideas in the past, and not gotten much
Chris> response. I imagine my ideas are full of holes, but no one has
Chris> taken the time to point them out to me yet. I have come to the
Chris> conclusion that the really sharp minds on this list (and the
Chris> folks that might be qualified to actually impliment such a thing)
Chris> are not really interested in something like this that is a
Chris> perfomance only improvement. Am I right? or is the idea just so
Chris> stupid that it's not worth commenting on?
It's not a matter of lack of interest as much as a desire to keep the
semantics of Python and lack of time. Here are some efforts I'm aware of.
Note that there are lots of ways performance can be addressed, not just by
constraining types.
* Rattlesnake (my thing) - a new register-oriented virtual machine whose
premise is that the current stack-oriented virtual machine does far
too much data shuffling
* Psyco (Armin Rego) - a run-time specializing virtual machine that sees
what sorts of types are input to a function and compiles a type- or
value-specific version of that function on-the-fly. I believe Armin
is looking at some JIT code generators in addition to or instead of
another virtual machine.
* Static typing SIG (http://www.python.org/sigs/types-sig/) - I'm not
familiar with this SIGs work, so you'll have to browse the archives.
* Python2C (Bill Tutt, Greg Stein) - A "module compiler" that generates
a C version of a Python module. http://www.mudlib.org/~rassilon/p2c/
* PEP 266 - A "pie in the sky" PEP I wrote that proposes that global
data could be accessed like local data if you reverse the roles of who
keeps references up-to-date.
* Various peephole optimizers - I wrote one a few years ago. There's
also Michael Hudson's bytecodehacks project (on Sourceforge). The
xfreeze bytecode freezer (part of the standard distribution?) also
does some peepholing.
I'm sure there are several I'm missing.
--
Skip Montanaro (skip at pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/
More information about the Python-list
mailing list