Long integers, xrange and number theory

Sat Dec 7 11:22:40 EST 2002

Are there any plans to allow (large) long integer arguments to the
built-in functions range and xrange?  I assume that this is part of
the int/long unification, but was unable to find any mention of it in
PEP 237.

I've been writing a small number theory library for use in teaching a
course on elementary number theory, and have run into the xrange
argument restriction on a number of occasions; the result has usually
been code which is either significantly less readable (using
equivalent while-loops) or less efficient (using a home-brewed integer
range generator).  Is there a better way?  And is there a really good
reason why xrange doesn't accept long integer arguments?

Here are two examples to motivate allowing xrange to accept long
integer arguments.

(1) Lehman's O(n**(1/3)) (actually n**(1/3 + epsilon)) method for
integer factorization: suppose an integer n >= 3 has a factor that
lies between n**(1./3) and n**(2./3).  Then Lehman's algorithm is
guaranteed to find a proper factor of n.  When combined with trial
division up to n**(1./3), this gives a simple factorization algorithm,
which runs significantly faster than straight trial division for large
n (>= 10**15, say).

In words the algorithm is as follows: for all integers k and a
satisfying 1 <= k <= n**(1./3) and 4*k*n <= a**2 <= 4*k*n + n**(2./3),
test the quantity a*a-4*k*n to see whether it is a square or not.  If
it is, equal to b**2, say, then the greatest common divisor of n and
(a+b)/2 gives a proper factor of n, which can be returned.  This
translates neatly into the following code...

def Lehman(n):
    n13, n23 = iCbrt(n), iCbrt(n*n)
    for k in xrange(1, n13+1):
        for a in xrange(iSqrt(4*k*n-1)+1, iSqrt(4*k*n+n23)+1):
            if isSquare(a*a-4*k*n):
                return gcd((a+iSqrt(a*a-4*k*n))/2,n)

...where iSqrt(m) and iCbrt(m) return the (integer) floor of the square
root and cube root of an integer m respectively, gcd(a,b) returns the
greatest common divisor of a and b and isSquare(m) returns True if
m=l*l for some integer l and False otherwise.  But this code doesn't
work:

>>> Lehman(720676707764753)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "ElemNum.py", line 641, in Lehman2
    for a in xrange(iSqrt(4*k*n-1)+1, iSqrt(4*k*n+n23)+1):
OverflowError: long int too large to convert to int
>>>

The long-integer-safe version of the code is:

def Lehman(n):
    n13, n23 = iCbrt(n), iCbrt(n*n)
    k = 1
    while k <= n13:
        a = iSqrt(4*k*n-1)+1
        alimit = iSqrt(4*k*n+n23) # avoid recomputing iSqrt for every a
        while a <= alimit:
            if isSquare(a*a-4*k*n):
                return gcd((a+iSqrt(a*a-4*k*n))/2,n)
	    a += 1
        k += 1

which returns the factor 22073477 of 720676707764753 in a little under
five seconds on my machine.   It's a shame that the first version of
the code doesn't work, since I think it's a lot clearer, especially
for students who aren't particularly used to reading code.

(2) Finding a quadratic nonresidue modulo a large prime p.

It's not vital to know what this means to understand the example (but
see below): suffice it to say that if p >= 3 is a prime number, then
every integer 1 <= a <= p-1 is either a `quadratic residue' or a
`quadratic nonresidue'; there are exactly (p-1)/2 quadratic residues
and (p-1)/2 quadratic nonresidues, but it's rather hard to say
anything about the order in which the residues and nonresidues appear;
in practice one expects the first nonresidue to be small (less than
2(log(p))**2, for instance).  The following code is obvious, simple,
clear, and practical for prime numbers up to a few hundred digits
long:

def findQN(p):
    for a in xrange(1,p):
        if isQN(a):   # isQN(a) is True if a is a nonresidue, else False
            return a

...but it fails as soon as p >= 2**31 (on a 32-bit machine), thanks to
the limitations of xrange.  One can rewrite it using a while loop, but
it becomes less clear.  One can replace the upper bound p with a
smaller limit, but then there's a lot of brain work to decide what a
sensible choice for that limit should be, what to do in the event
that it turns out not to be large enough, and so on.

(Definition for those who still want to know: 1 <= a <= p-1 is a
quadratic residue (modulo p) if there is an integer x such that x*x-a
is a multiple of p.  It's a quadratic nonresidue (modulo p) if there
is no such integer.  Finding quadratic nonresidues is important for
things like extracting square roots and solving quadratic equations
over finite fields.)

These examples are simple ones, and one can argue that the loss in
readability in replacing for loops with while loops is small.  But in
more complicated algorithms with large numbers of for-loops I believe
the loss can be severe.  Please can we have xrange working with long
integer arguments?

All the best,

Mark