Long integers, xrange and number theory

Bengt Richter bokr at oz.net
Sat Dec 7 18:20:40 EST 2002


On Sat, 7 Dec 2002 11:22:40 -0500 (EST), Mark Edward Tristan Dickinson <dickinsm at umich.edu> wrote:

>Are there any plans to allow (large) long integer arguments to the
>built-in functions range and xrange?  I assume that this is part of
>the int/long unification, but was unable to find any mention of it in
>PEP 237.
>
>I've been writing a small number theory library for use in teaching a
>course on elementary number theory, and have run into the xrange
>argument restriction on a number of occasions; the result has usually
>been code which is either significantly less readable (using
>equivalent while-loops) or less efficient (using a home-brewed integer
I suspect the xrange vs generator efficiency worry is disproportionate ;-)
(see below).

>range generator).  Is there a better way?  And is there a really good
>reason why xrange doesn't accept long integer arguments?
I agree that should be fixed.

>
>Here are two examples to motivate allowing xrange to accept long
>integer arguments.
>
>(1) Lehman's O(n**(1/3)) (actually n**(1/3 + epsilon)) method for
>integer factorization: suppose an integer n >= 3 has a factor that
>lies between n**(1./3) and n**(2./3).  Then Lehman's algorithm is
>guaranteed to find a proper factor of n.  When combined with trial
>division up to n**(1./3), this gives a simple factorization algorithm,
>which runs significantly faster than straight trial division for large
>n (>= 10**15, say).
>
>In words the algorithm is as follows: for all integers k and a
>satisfying 1 <= k <= n**(1./3) and 4*k*n <= a**2 <= 4*k*n + n**(2./3),
>test the quantity a*a-4*k*n to see whether it is a square or not.  If
>it is, equal to b**2, say, then the greatest common divisor of n and
>(a+b)/2 gives a proper factor of n, which can be returned.  This
>translates neatly into the following code...
>
>def Lehman(n):
>    n13, n23 = iCbrt(n), iCbrt(n*n)
>    for k in xrange(1, n13+1):
>        for a in xrange(iSqrt(4*k*n-1)+1, iSqrt(4*k*n+n23)+1):
>            if isSquare(a*a-4*k*n):
>                return gcd((a+iSqrt(a*a-4*k*n))/2,n)

Remember, Python is not C++, so if you want speed, I think you might gain more
by avoiding recalculations (which a good optimizer would do for you in a
static situation) than you'll lose by using a generator in place of xrange.
E.g., maybe this (untested):

def Lehman(n):
    n13, n23 = iCbrt(n), iCbrt(n*n)
    for k in xxrange(1, n13+1):
        t_4kn = 4*k*n
        for a in xxrange(iSqrt(t_4kn-1)+1, iSqrt(t_4kn+n23)+1):
            if isSquare(a*a - t_4kn): # don't save, since it must fail all but once
                return gcd((a+iSqrt(a*a - t_4kn))/2,n)

where

def xxrange(lo, hi): # too lazy to deal with 1 and 3 args, plus it's faster for above ;-)
    while lo < hi:
        yield lo
        lo += 1

(BTW, you'll need from __future__ import generators at the top of the file)

If you want to tune further, you could make sure that iCbrt and xxrange and iSqrt and isSquare
have local bindings inside Lehman, and use those in the loop instead of global names.

I have a hunch other stuff could be tuned up too, but if you're not careful, you'll
wind up teaching Python optimization instead of number theory ;-)

[...more interesting stuff snipped...]
>These examples are simple ones, and one can argue that the loss in
>readability in replacing for loops with while loops is small.  But in
>more complicated algorithms with large numbers of for-loops I believe
>the loss can be severe.  Please can we have xrange working with long
>integer arguments?
>
I think it will be fixed eventually, but for now I'd just use the xxrange
above (untested) and not worry about what I'd bet is not proportionally
that big a time drain in the algorithm.

But I'd be interested to know the time for the example factor 22073477 of 720676707764753
if you just plug in xxrange, and then if you optimize to avoid the recalculations.
And then with local bindings, if you're curious. Or if you have an URL to all the
relevant code I might try it.

Regards,
Bengt Richter



More information about the Python-list mailing list