[ python-Feature Requests-1003935 ] xrange overflows

Tue Jul 12 00:24:40 CEST 2005

Feature Requests item #1003935, was opened at 2004-08-05 15:16
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1003935&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Hallvard B Furuseth (hfuru)
Assigned to: Raymond Hettinger (rhettinger)
Summary: xrange overflows

Initial Comment:
These restrictions are undocumented both in the
xrange doc string and in the reference manual
(Info node 'XRange Type'):

  >>> xrange(maxint, maxint + 10)
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  OverflowError: long int too large to convert to int

  >>> xrange(-100, maxint)
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  OverflowError: xrange() result has too many items

I hope the overflows below are bugs and not
features.  It works if 3/-3 is replaced with 1/-1:

  >>> xrange(0, maxint, 3)
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  OverflowError: integer addition

  >>> xrange(0, -maxint, -3)
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  OverflowError: integer addition

Python installation:

  Python 2.3.3 (#1, May 25 2004, 20:22:36) 
  [GCC 3.2.3] on sunos5
  Type "help", "copyright", "credits" or "license"
for    more information.
  >>> from sys import maxint
  >>> "%x" % maxint
  '7fffffff'

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2005-07-12 00:24

Message:
Logged In: YES 
user_id=21627

Using xrange for an infinite loop qualifies as "cute" =
"obviously straining for effect". The most natural way of an
infinite loop is "while True". There are certainly other
ways to express an infinite loop (like reading from
/dev/zero, or writing an unbounded recursion), but arguing
that xrange is "much faster" is obviously straining for
effect: If the loop is infinite, how can it be much faster?
And why does it matter if it is? (in my measurements, it is
30% faster, counting the time for a given number of iterations).

----------------------------------------------------------------------

Comment By: Hallvard B Furuseth (hfuru)
Date: 2005-07-11 12:16

Message:
Logged In: YES 
user_id=726647

What is "cute" about using maxint as an xrange() limit,
and why would it impact its performance to return its 2nd
parameter as endvalue +/- 1 (an int) instead of trying to
return endvalue + step (which can be long)?

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2005-07-11 01:11

Message:
Logged In: YES 
user_id=80475

Will look at it to see if there is something simple that can
be done.  These were design decisions  -- xrange() and
count() are supposed to be simple and fast -- with other
tools being preferred if you are pushing beyond the limit of
normal use cases.  IOW, it is not worth crippling their
performance just because someone has discovered cute ways of
using sys.maxint.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2005-07-10 23:19

Message:
Logged In: YES 
user_id=31435

Unassigned myself, since I don't plan to do anything else 
here.

----------------------------------------------------------------------

Comment By: Connelly (connelly)
Date: 2004-12-06 08:04

Message:
Logged In: YES 
user_id=1039782

Yes, I run into this bug all the time.  For example, I may 
want to generate all strings of length n:

BINARY_CHARSET = ''.join([chr(i) for i in range(256)])

def all_strs(n, charset=BINARY_CHARSET):
  m = len(charset)
  for i in xrange(m**n):
    yield ''.join([charset[i//m**j%m] for j in range(n)])

This is correct algorithmically, but fails due to the buggy 
Python xrange() function.  So I end up writing my own irange
() function.

Other cases where I've run into this problem: Sieving for 
primes starting at a given large prime, checking for integer 
solutions to an equation starting with a large integer, and 
searching very large integer sets.

itertools.count and itertools.repeat are similarly annoying.  
Often I want to search the list of all positive integers starting 
with 1, and have to use an awkward while loop to do this, or 
write my own replacement for itertools.count.

There should be little performance hit for apps which use 
xrange(n), where n fits in the system's integer size.  xrange
() can just return an iterator which returns system ints, and 
the only performance penalty is an extra if statement in the 
call to xrange (and there is no performance penalty in the 
iterator's next() method).

Finally, this move appears consistent with Python's values, ie 
simplicity, long ints supported with no extra effort, avoid 
gotchas for newbies, no special cases, etc.

----------------------------------------------------------------------

Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-08 20:09

Message:
Logged In: YES 
user_id=726647

Here is why repr() is relevant - though the error message
was certainly weird.  With the latest CVS version:

  >>> xrange(maxint-1, maxint, 2)
  xrange(2147483646, -2147483648, 2)

Which is why I suggested to return last+step/abs(step)
as the 2nd argument.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-08-08 09:20

Message:
Logged In: YES 
user_id=31435

Changed range_new() to stop using PyRange_New().  This 
fixes a variety of bogus errors.

Documented that xrange() is intended to be simple and fast, 
and that CPython restricts its arguments, and length of its 
result sequence, to native C longs.

Added some tests that failed before the patch, and repaired 
a test that relied on a bogus OverflowError getting raised.

Doc/lib/libfuncs.tex; new revision: 1.171
Lib/test/test_xrange.py; new revision: 1.2
Objects/rangeobject.c; new revision: 2.53

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-08-06 07:10

Message:
Logged In: YES 
user_id=31435

It looks like there are two bugs here.  One is that the "integer 
addition" detail doesn't make sense, since the user isn't doing 
any integer addition here (sorry, no, repr() is irrelevant to 
this).

Second, it shouldn't be complaining in the last two cases at 
alll.  If the numbers truly were out of range, then 
rangeobject.c's range_new() would have raised a "too many 
items" exception.  Note:

>>> from sys import maxint as m
>>> xrange(0, m, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer addition
>>> xrange(-m, m, 2)
xrange(-2147483647, 2147483647, 2)
>>>

The second xrange() there contains twice as many items as 
the first one, but doesn't complain.  It's code in PyRange_New
() that's making the bogus complaint, and I can't figure out 
what it thinks it's doing.

The code in get_len_of_range() is correct.

The code in PyRange_New() is both overly permissive (e.g., it 
silently lets "(len - 1) * step" overflow), and overly restrictive 
(e.g, I can't see why it should matter if "last > (PyInt_GetMax
() - step))" -- the only thing in that specific branch that 
*should* matter is whether the integer addition "start + (len -
 1) * step" overflowed (which it isn't checking for correctly, 
even assuming the multiplication didn't overflow).

The obvious fix for xrange() is to speed range_new() by 
throwing away its call to the broken PyRange_New().  
range_new() is already doing a precise job of checking 
for "too big", and already knows everything it needs to 
construct the right rangeobject.

That would leave the PyRange_New() API call with broken 
overflow checking, but it's not called from anywhere else in 
the core.

----------------------------------------------------------------------

Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-05 18:42

Message:
Logged In: YES 
user_id=726647

The only reason I can see that xrange(0, maxint, 3)
gives integer overflow is that repr() returns
'xrange(first, last + step, step)',
where last + step would be too large.

I suggest repr() is changed to return
'xrange(first, last + step/abs(step), step)'.

----------------------------------------------------------------------

Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-05 17:14

Message:
Logged In: YES 
user_id=726647

> Do you have a real use case for this?

For the 'hopefully bugs' variants, yes:

#1: Loop forever:

  for i in xrange(x, sys.maxint, y)

That's a lot faster than

  i = x
  while True: ... i += y

#2: 'loop until optional upper bound':

  def some_loop(start, end = sys.maxint):
    for i in xrange(start, end, whatever())

> Do any real apps need to loop over more than
> sys.maxint integers?

The answer may be yes nowadays.  Even my old
Solaris can find primes up to maxint/2 in just
2 hours.  That's a loop over maxint/4 integers.
Though the remaining 3/4 come slower:-)

Still, I expect variants of the above code would
be less uncommon, like some_loop(-100).

> It would be ashamed to muckup the high
> performance implementation for something that
> does not arise in practice.

I hope you do not mean xrange(0, maxint, 3).

If you mean xrange(-100, maxint): Maybe xrange
could be split in several types (fast and slower)
and the xrange() operator would return one of
these, similar to how int() now can return long?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-08-05 17:12

Message:
Logged In: YES 
user_id=21627

The OverflowErrors are definitely intentional, and by
design. xrange() is documented as working the same way as
range(), so any error in the documentation is an error in
the range() documentation.

Reclassifying this as a documentation bug.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-05 15:49

Message:
Logged In: YES 
user_id=80475

Do you have a real use case for this?  Do any real apps need
to loop over more than sys.maxint integers?  It would be
ashamed to muckup the high performance implementation for
something that does not arise in practice.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1003935&group_id=5470