[Tutor] recursive factoring

Kirby Urner urnerk@qwest.net
Thu, 17 Jan 2002 10:43:50 -0800


As I expected, a sieve-based method is somewhat faster
than chewing off smallest factors while counting up
from 1.  And once the factors get big enough the sieve-
based method remains practical (to some further point),
while the counting-by-1 approach dies, simply because
it has to go "all the way" before figuring out it's
dealing with a prime.

To profile, I use the function below, which takes the
function to test, and its argument, as its own arguments:

    import time

    def profile(f,arg):
	a = time.time()
	print f(arg)
	b = time.time()
	return b-a

   >>> profile(factor,20034321)
   [3, 79, 84533]
   0.060000061988830566

is sieve-based.  Then I try Michele's (original version):

   >>> profile(primeNumber,20034321)
   [3, 79, 84533]
   0.1099998950958252

Somewhat slower.

However, the sieve-based version isn't so pretty to look
at.  Likely this could be simplified.  It's not recursive:

  import math
  from primes import sieve

  def factor(n):	
     plist = sieve(math.sqrt(n))
     pfacts = []
     ldiv=1
     while 1:
         if ldiv:
            if n==1:  break
            else:  ldiv=0
         for i in plist:
	    if n%i==0:
                ldiv=1
                n = n/i
                pfacts.append(i)
         if not ldiv:
             pfacts.append(n)
             break
     return pfacts

The basic idea is it lists primes up to the sqrt of n,
and if none of those divide, no divisor is found, and
n itself is considered prime.  But in most cases, you
get into a loop and break out from the top when n==1.

But on top of all that, you still need the sieve.  I
included an efficient one a few posts back, and that's
what I'm importing from primes.py.

E.g. when I ask for the factors of 211, the sieve returns
primes only up to sqrt(211) i.e. [2, 3, 5, 7, 11, 13].
Since all those fail to divide 211 evenly, we return
out the bottom, appending 211 itself as a prime factor.

The sieve itself has another sqrt in it, i.e. since we
only need primes up to 13, it's only going to test divide
by 2 and 3.  Actually, it begins with odds 3 5 7 9 11 13
with 2 already known to be prime.  After eliminating the
9 (the first 3 is kept), we're done.

This is all quite efficient.  Let's try a much bigger
number:

   >>> profile(factor,2133431113212)
   [2, 3, 7, 17, 2, 3, 3, 165999931L]
   4.2300000190734863

The sieve-based function runs in under 5 seconds (depends on
processor of course), whereas the non sieve-based function
isPrimenumber gets stuck counting to that last prime factor
by 1, which takes a long time.  Just try against that last
factor (165999931) to see why one might want to sieve:

    >>> profile(primeNumber,165999931)  # ouch
    [165999931]
    205.37000000476837

    >>> profile(factor,165999931) # sometimes 0.0
    [165999931]
    0.059999942779541016

There's a method called Miller-Rabin which generates
very large numbers very likely to be prime.  These are
so far not practical to crack (factor), which is why we
have a "trap door" function -- something we can do, but
not undo.

The RSA algorithm takes two of these huge probable primes
and multiplies them together for a public key.  Even
though the key is public, no known techniques let us
factor such a number in reasonable time.  If a public
key could be factored, the code could be broken (people
used to think 512-bit public keys were safe, but a team
cracked a challenge public key of that size, so now
everyone says to go 1024 bits at least -- note that
every bit *doubles* the keyspace, so doubling the key
size is way more than doubling the difficulty of
factoring).

Raising to powers modulo N is a cyclic thing -- you keep
rotating through the same numbers.  Knowing the factors
of N and the power the message had been raised to (which
is also public) would tell you how to raise the ciphertext
to a countering power that'd rotate it back through to
the message itself, one power after message^0 = 1 on the
dial.  Not knowing the factors, the dial is just too big
around to try every power-position.

Here's a 50-digit probable prime:

 >>> from primes import bigppr
 >>> bigppr(50)
Working...
Percent chance of being prime: 100.0
Elapsed time: 0.399939657051 seconds
757795979917279974119994023697095922339387199771439L

That's out of range for my sieve-based program, to
check for factors (most likely 1, and the above are
the only two).

Kirby