I'm missing something here with range vs. xrange

Mon Dec 10 04:29:55 EST 2007

"Joe Goldthwaite" <joe at goldthwaites.com> writes:

> To do a range(1000000)
>
> 	1. Allocate a big chunk of memory
> 	2. Fill the memory with the range of integers
> 	3. Setup an index into the memory array
> 	4. Every time the program asks for the next item, bump
> 	   the memory index by one, get the value and return it.
>
> To do an xrange(10000000)
>
> 	1. Set up a counter
> 	2. Every time the program asks for the next item, increment
>          the counter and return it.
>
> I guess I thought that between these two methods, the second would be
> dramatically faster. An order of magnitude faster.

You're forgetting that your test iterates over all the elements in
both cases.  The accumulated cost of each iteration simply obscures
the added cost of ahead-of-time allocation and list setup, especially
as the latter is implemented in fairly optimized C.

In other words, the complexity of the task is the same is both cases,
O(n), and your timing reflects that.  When you think about it that
way, there is no reason for the entire loop to be an order of
magnitude slower -- in fact, if it were, it would be a bug in "range".
"range" will only become an order (or more) of magnitude slower when
you start allocating extremely large lists and it starts having to
allocate virtual memory off the system's swap.

The operation where xrange excels is the list setup step:

$ python -m timeit 'xrange(1000000)'
1000000 loops, best of 3: 0.463 usec per loop
$ python -m timeit 'range(1000000)'
10 loops, best of 3: 35.8 msec per loop

xrange finishes in sub-microsecond time, while range takes tens of
milliseconsd, being is ~77000 times slower.  This result follows your
reasoning: because range needs to allocate and set up the list in
advance, it is orders of magnitude slower.

> Hence, the title, "I'm missing something".

I hope this clears it up for you.