[Python-3000] range() issues

Wed Apr 30 04:16:53 CEST 2008

On Tue, Apr 29, 2008 at 7:48 PM, Guido van Rossum <guido at python.org> wrote:
..
>  Let's just stop the discussion here and kill all proposals to add
>  indexing/slicing etc. Sorry, Alexander, but there just isn't anyone
>  besides you in favor, and nobody has brought up a convincing use case.
>

That's fair, but let me wrap up by rehashing the current state of affairs.

1. Both 2.x xrange and 3.x range support indexing.  A comment in py3k
branch says "range(...)[x] is necessary for:  seq[:] = range(...),"
but this is apparently wrong:

>>> x = []
>>> x[:] = iter([1,2,3])
>>> x
[1, 2, 3]

2. In 3.x, ranges longer that sys.sizemax are allowed, but cannot be
indexed even with small indexes, for example, range(2**100)[0] raises
an OverflowError.  There is little justification for this behavior.  A
3-line patch can fix the situation for small indexes and Amaury
demonstrated [1] that with some effort arbitrary indexes can be
supported.

[1] http://bugs.python.org/file10109/anyrange.patch

3. There is an ongoing debate [2] on how comparison and hashing should
be implemented for range objects.

My point is that current implementation of 3.x is neither here nor
there.  It is not simple: it does not even do what its documentation
says:

>>> print(range.__doc__)
range([start,] stop[, step]) -> range object

Returns an iterator that generates the numbers in the range on demand.
>>> range(10).__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    AttributeError: 'range' object has no attribute '__next__'

It supports some sequence methods (len and subscripting), but not
others (__contains__ and slicing).

My use case for making range a Sequence is as follows.  I frequently
deal with data organized in column oriented tables.  These tables
often need a column that represents the row number.  A range object
would allow an efficient representation of such column, but having
such a virtual column in the table would mean that generic sequence
manipulation functions will not work on some columns.

This is not a strong itch, though.  While virtualizing row number
column using range() is an attractive solution, in practice memory
savings compared to numpy's arange() (or array('i', range(..))) are
not that significant.  However, if slicing support is axed based on
complexity considerations, I don't see how supporting indexing can be
justified.  Moreover, since indexing and slicing can reuse the same
start + i*step computation, the incremental code complexity of slicing
support is small, so for me the two go hand in hand.  For these
reasons, I believe that either of the following alternatives is better
than the status quo:

1. Make range(..) return a Sequence.

2. Make range(..) return an Iterator.  (While I prefer #1, there are
several advantages of this proposal: in the common list(range(..)) and
for i in range(..) cases, creation of an intermediate object will go
away; we will stop debating what hash(range(..)) should return [2];
and finally we will not need to change the docstring :-).)

[2] http://bugs.python.org/issue2603

>  __len__ will always be problematic when there are more values than can
>  be counted in a signed C long; maybe we should do what the Java
>  collections package does: for once, Java chooses practicality over
>  purity, and simply states that if the length doesn't fit, the largest
>  number that does fit is returned (i.e. for us that would be
>  sys.maxsize in 3.0, sys.maxint in 2.x).

This is another simple way to fix  range(2**100)[0] buglett.