Python is faster than C

Armin Rigo arigo at tunes.org
Sat Apr 3 17:31:59 EST 2004


Hello Robert,

On Sat, Apr 03, 2004 at 12:30:38PM -0800, Robert Brewer wrote:
> > enumerate() should return a normal list, and
> > it should be someone else's job to ensure that it is 
> > correctly optimized away if possible
> 
> I'd like to think I'm not understanding your point, but you made it so
> danged *clear*.
> 
> Enumerate should absolutely *not* return a normal list.

You missed my point indeed.  There are two levels here: one is the language
specification (the programmer's experience), and one is the CPython
implementation.  My point is that with some more cleverness in the
implementation, iterators would be much less needed at the language
specification level (I'm not saying never, I think generators are great, for
example).

> The use case I think you're missing is when I do not want the enumeration
> optimized at all; I want it performed on-the-fly on purpose:

This is what I mean by "optimized": done lazily, on-the-fly.  I want better
implementations of lists, callbacks-on-changes, static bytecode analysis, and
more.  I don't want another notion than lists at the language level.  Your
example:

> for i, line in enumerate(file('40GB.csv')):

is among the easiest to optimize, even if the language specification said that
enumerate returns a list.  I can think of several ways to do that.  For
example, because the result of enumerate() is only ever used in a for loop, it
knows it can internally return an iterator instead of the whole list.  There
are some difficulties, but nothing critical.  Another option which is harder
in CPython but which we are experimenting with in PyPy would be to return a
Python object of type 'list' but with a different, lazy implementation.

> Forcing enumerate to return a list would drag not only the entire
> 40GB.csv into memory, but also the entire set of i. Using an iterator in
> this case instead of a list *is* the optimization.

Yes, and I'm ranting against the idea that the programmer should be bothered
about it, when it could be as efficient automatically.  From the programmer's
perspective, iterators are mostly like a sequence that you can only access
once and in order.  A better implementation can figure out for itself when you
are only accessing this sequence once and in order.  I mean, it is just like
range(1000000) which is a list all right, but there is just no reason why this
list should consume 4MB of CPython's memory when the same information can be
encoded in a couple of ints as long as you don't change the list.  The
language doesn't need xrange() -- it is an implementation issue that shows up
in the Python language.


Armin





More information about the Python-list mailing list