[ python-Feature Requests-1003935 ] xrange overflows
SourceForge.net
noreply at sourceforge.net
Tue Jul 12 00:24:40 CEST 2005
Feature Requests item #1003935, was opened at 2004-08-05 15:16
Message generated for change (Comment added) made by loewis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1003935&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Hallvard B Furuseth (hfuru)
Assigned to: Raymond Hettinger (rhettinger)
Summary: xrange overflows
Initial Comment:
These restrictions are undocumented both in the
xrange doc string and in the reference manual
(Info node 'XRange Type'):
>>> xrange(maxint, maxint + 10)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: long int too large to convert to int
>>> xrange(-100, maxint)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: xrange() result has too many items
I hope the overflows below are bugs and not
features. It works if 3/-3 is replaced with 1/-1:
>>> xrange(0, maxint, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: integer addition
>>> xrange(0, -maxint, -3)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: integer addition
Python installation:
Python 2.3.3 (#1, May 25 2004, 20:22:36)
[GCC 3.2.3] on sunos5
Type "help", "copyright", "credits" or "license"
for more information.
>>> from sys import maxint
>>> "%x" % maxint
'7fffffff'
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2005-07-12 00:24
Message:
Logged In: YES
user_id=21627
Using xrange for an infinite loop qualifies as "cute" =
"obviously straining for effect". The most natural way of an
infinite loop is "while True". There are certainly other
ways to express an infinite loop (like reading from
/dev/zero, or writing an unbounded recursion), but arguing
that xrange is "much faster" is obviously straining for
effect: If the loop is infinite, how can it be much faster?
And why does it matter if it is? (in my measurements, it is
30% faster, counting the time for a given number of iterations).
----------------------------------------------------------------------
Comment By: Hallvard B Furuseth (hfuru)
Date: 2005-07-11 12:16
Message:
Logged In: YES
user_id=726647
What is "cute" about using maxint as an xrange() limit,
and why would it impact its performance to return its 2nd
parameter as endvalue +/- 1 (an int) instead of trying to
return endvalue + step (which can be long)?
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2005-07-11 01:11
Message:
Logged In: YES
user_id=80475
Will look at it to see if there is something simple that can
be done. These were design decisions -- xrange() and
count() are supposed to be simple and fast -- with other
tools being preferred if you are pushing beyond the limit of
normal use cases. IOW, it is not worth crippling their
performance just because someone has discovered cute ways of
using sys.maxint.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2005-07-10 23:19
Message:
Logged In: YES
user_id=31435
Unassigned myself, since I don't plan to do anything else
here.
----------------------------------------------------------------------
Comment By: Connelly (connelly)
Date: 2004-12-06 08:04
Message:
Logged In: YES
user_id=1039782
Yes, I run into this bug all the time. For example, I may
want to generate all strings of length n:
BINARY_CHARSET = ''.join([chr(i) for i in range(256)])
def all_strs(n, charset=BINARY_CHARSET):
m = len(charset)
for i in xrange(m**n):
yield ''.join([charset[i//m**j%m] for j in range(n)])
This is correct algorithmically, but fails due to the buggy
Python xrange() function. So I end up writing my own irange
() function.
Other cases where I've run into this problem: Sieving for
primes starting at a given large prime, checking for integer
solutions to an equation starting with a large integer, and
searching very large integer sets.
itertools.count and itertools.repeat are similarly annoying.
Often I want to search the list of all positive integers starting
with 1, and have to use an awkward while loop to do this, or
write my own replacement for itertools.count.
There should be little performance hit for apps which use
xrange(n), where n fits in the system's integer size. xrange
() can just return an iterator which returns system ints, and
the only performance penalty is an extra if statement in the
call to xrange (and there is no performance penalty in the
iterator's next() method).
Finally, this move appears consistent with Python's values, ie
simplicity, long ints supported with no extra effort, avoid
gotchas for newbies, no special cases, etc.
----------------------------------------------------------------------
Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-08 20:09
Message:
Logged In: YES
user_id=726647
Here is why repr() is relevant - though the error message
was certainly weird. With the latest CVS version:
>>> xrange(maxint-1, maxint, 2)
xrange(2147483646, -2147483648, 2)
Which is why I suggested to return last+step/abs(step)
as the 2nd argument.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-08-08 09:20
Message:
Logged In: YES
user_id=31435
Changed range_new() to stop using PyRange_New(). This
fixes a variety of bogus errors.
Documented that xrange() is intended to be simple and fast,
and that CPython restricts its arguments, and length of its
result sequence, to native C longs.
Added some tests that failed before the patch, and repaired
a test that relied on a bogus OverflowError getting raised.
Doc/lib/libfuncs.tex; new revision: 1.171
Lib/test/test_xrange.py; new revision: 1.2
Objects/rangeobject.c; new revision: 2.53
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-08-06 07:10
Message:
Logged In: YES
user_id=31435
It looks like there are two bugs here. One is that the "integer
addition" detail doesn't make sense, since the user isn't doing
any integer addition here (sorry, no, repr() is irrelevant to
this).
Second, it shouldn't be complaining in the last two cases at
alll. If the numbers truly were out of range, then
rangeobject.c's range_new() would have raised a "too many
items" exception. Note:
>>> from sys import maxint as m
>>> xrange(0, m, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OverflowError: integer addition
>>> xrange(-m, m, 2)
xrange(-2147483647, 2147483647, 2)
>>>
The second xrange() there contains twice as many items as
the first one, but doesn't complain. It's code in PyRange_New
() that's making the bogus complaint, and I can't figure out
what it thinks it's doing.
The code in get_len_of_range() is correct.
The code in PyRange_New() is both overly permissive (e.g., it
silently lets "(len - 1) * step" overflow), and overly restrictive
(e.g, I can't see why it should matter if "last > (PyInt_GetMax
() - step))" -- the only thing in that specific branch that
*should* matter is whether the integer addition "start + (len -
1) * step" overflowed (which it isn't checking for correctly,
even assuming the multiplication didn't overflow).
The obvious fix for xrange() is to speed range_new() by
throwing away its call to the broken PyRange_New().
range_new() is already doing a precise job of checking
for "too big", and already knows everything it needs to
construct the right rangeobject.
That would leave the PyRange_New() API call with broken
overflow checking, but it's not called from anywhere else in
the core.
----------------------------------------------------------------------
Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-05 18:42
Message:
Logged In: YES
user_id=726647
The only reason I can see that xrange(0, maxint, 3)
gives integer overflow is that repr() returns
'xrange(first, last + step, step)',
where last + step would be too large.
I suggest repr() is changed to return
'xrange(first, last + step/abs(step), step)'.
----------------------------------------------------------------------
Comment By: Hallvard B Furuseth (hfuru)
Date: 2004-08-05 17:14
Message:
Logged In: YES
user_id=726647
> Do you have a real use case for this?
For the 'hopefully bugs' variants, yes:
#1: Loop forever:
for i in xrange(x, sys.maxint, y)
That's a lot faster than
i = x
while True: ... i += y
#2: 'loop until optional upper bound':
def some_loop(start, end = sys.maxint):
for i in xrange(start, end, whatever())
> Do any real apps need to loop over more than
> sys.maxint integers?
The answer may be yes nowadays. Even my old
Solaris can find primes up to maxint/2 in just
2 hours. That's a loop over maxint/4 integers.
Though the remaining 3/4 come slower:-)
Still, I expect variants of the above code would
be less uncommon, like some_loop(-100).
> It would be ashamed to muckup the high
> performance implementation for something that
> does not arise in practice.
I hope you do not mean xrange(0, maxint, 3).
If you mean xrange(-100, maxint): Maybe xrange
could be split in several types (fast and slower)
and the xrange() operator would return one of
these, similar to how int() now can return long?
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2004-08-05 17:12
Message:
Logged In: YES
user_id=21627
The OverflowErrors are definitely intentional, and by
design. xrange() is documented as working the same way as
range(), so any error in the documentation is an error in
the range() documentation.
Reclassifying this as a documentation bug.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-08-05 15:49
Message:
Logged In: YES
user_id=80475
Do you have a real use case for this? Do any real apps need
to loop over more than sys.maxint integers? It would be
ashamed to muckup the high performance implementation for
something that does not arise in practice.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1003935&group_id=5470
More information about the Python-bugs-list
mailing list