I'm missing something here with range vs. xrange

Mon Dec 10 11:55:47 EST 2007

On Dec 7, 2007 8:58 PM, Joe Goldthwaite <joe at goldthwaites.com> wrote:
> >You can't imagine why someone might prefer an iterative solution over
> >a greedy one? Depending on the conditions, the cost of creating the
> >list can be a greater or a lesser part of the total time spent. Actual
> >iteration is essentially the same cost for both. Try looking at memory
> >usage while you're running these tests.
>
> I can imagine why someone would want to use in iterative solution.  What I
> don't understand is why there's so little difference between the two.
> Here's what I think is going on;
>
> To do a range(1000000)
>
>         1. Allocate a big chunk of memory
>         2. Fill the memory with the range of integers
>         3. Setup an index into the memory array
>         4. Every time the program asks for the next item, bump
>            the memory index by one, get the value and return it.
>

No. What range does is create and return a list of integers. The
iteration is a separate step, and is traditional list iteration. In
longhand, for x in range(y): looks something like this:

range = []
#this loop happens in C and is super-fast for small values of y, it
#gets much much slower when y gets out of the range of pre-allocated list pools
#and the small integer cache
while y >= 0:
  range.append(y)
  y -= 1

for x in range:
  #blah

List iteration works more or less as you describe, since it's implemented in C.

What you describe is more or less how psyco optimizes the use of
range(), though.

> To do an xrange(10000000)
>
>         1. Set up a counter
>         2. Every time the program asks for the next item, increment
>          the counter and return it.
>

This is essentially correct, but remember that it's constructing and
returning an int object each time. This is exactly the same thing that
the range() constructor does, it just happens lazily.

> I guess I thought that between these two methods, the second would be
> dramatically faster. An order of magnitude faster.  My surprise is that it's
> not.  That's why I figured I'm missing something because the above steps
> don't seem to describe what's going on.  Hence, the title, "I'm missing
> something".
>

The creation of the xrange object is massively faster - at least an
order of magnitude. However, once you actually iterate over both of
them you're normalizing almost all of the overhead that range has. The
advantage of xrange is the reduced memory pressure (and the improved
performance that often entails).