[Python-Dev] xrange accepting non-ints

Guido van Rossum guido at python.org
Thu Aug 24 17:58:19 CEST 2006


On 8/24/06, Neal Norwitz <nnorwitz at gmail.com> wrote:
> I've been working on enhancing xrange and there are a bunch of issues
> to consider.  I've got pretty much complete implementations in both C
> and Python.  Currently xrange is 2 objects:  range and the iter.
> These only work on C longs.  Here's what I propose:
>
> 2.6:
>  * Add deprecation warning if a float is passed to xrange (currently
> silently truncates)
>  * Allow int, long, float, or obj.__index__

float? I thought the first bullet says no float?

>  * Implement xrange in python

Since xrange is used in performance critical apps that may be a bad
idea. Or maybe only if the args aren't all ints?

>  * Implement iterator over C longs (or Py_ssize_t) in C
>  * Implement iterator over Python longs in Python (* may lose __length_hint__)
>  * Capture the values on construction, so mutating objects wouldn't
> affect xrange

Right. So capture them as Python int or long only.

> The idea is to expand xrange's capabilities so that it can replace range in 3k.
>
> I've profiled various combinations.  Here are the various results
> normalized doing xrange(0, 1e6, 1):
>
> Run on all integer (32-bit) values for start, step, end:
> C xrange and iter:  1
> Py xrange w/C iter: 1
> Py xrange w/Py iter (gen): 5-8
> Py xrange w/Py iter (class): ~30
>
> So having xrange in python is the same speed as if xrange is written
> in C.

I'm not sure I believe this benchmark -- to measure the cost of xrange
itself (as opposed to the cost of iterating over the iterator) you
should test xrange(0) or xrange(1).

> The important part is that the iterator needs to be written in
> C for speed.  If we use a generator, something like:
>
>         while value < end:
>             yield value
>             value += step
>
> The result is ~5 times slower in a release build and 8 times slower in
> a debug build than with an iterator implemented in C.

Nobody cares about the speed of the debug build. :-)

> Using a
> generator means that there is no __length_hint__.  If we go with a
> full class that has a __length_hint__ the result was ~32 times slower
> in a debug build.

I don't mind not having the length hint for longs.

Doesn't current xrange also support a faster version of reversed?

> The Python impl is about 1/10th the size of the C impl, though is
> lacking some comments.
>
> Run on Python longs the result is somewhat interesting.  The Python
> based iterator is faster.  There's probably a bug in the C version,
> but given that there is a lot of object allocation, I wouldn't expect
> the C version to ever be much faster than a similar Python version.
> Plus the Python version is trivial (same as above) for ints or longs.
> The C version for longs is quite a bit of code.

If the longs are large enough, the addition is going to dominate the cost, yes.

> Run on all Python longs (still 0..1e6, but sys.maxint..(sys.maxint +
> 1e6) is the same):
> C xrange and iter:  1.4
> Py xrange w/C iter: not tested
> Py xrange w/Py iter (gen): 1
> Py xrange w/Py iter (class): 4
>
> Caveats:
>  * The generator version above doesn't support floats.  We could
> easily support floats with a  different calculation that would be
> slightly more expensive, but not have accumulated error.

Is there a good use case for supporting float? The problem with floats
is that even apart from accumulated error, it's still ambiguous. E.g.
will xrange(1.1, 1.9, 0.1) include the end point or not? That would
depend on the details of decimal-to-binary conversion.

>  * By using the generator version, __length_hint__ gets dropped.  This
> means that converting the iterator into a sequence could be slightly
> more costly as we have to increase the allocation.  This would only
> happen if any of start, step, end weren't an int.

Fine with me -- you can't do that at all at the moment. :-)

>  * With a python implementation there is a little bit of bootstraping
> that is necessary to get the iter implemented in C into the xrange
> object implemented in Python

Long-term, I'd rather see it implemented all in C. Short term, the
Python implementation is great to experiment.

>  * Since xrange is implemented in Python, it can be changed.
>  * The Python code is much easier to understand than the C code (there
> is at least one bug in the current C version where -sys.maxint -1
> isn't always displayed properly).
>
> Hopefully this is all understandable.  If I left anything out, Thomas
> will remind me.

I'm about to head out to Google now...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list