[Tutor] recursive factoring
Kirby Urner
urnerk@qwest.net
Thu, 17 Jan 2002 10:43:50 -0800
As I expected, a sieve-based method is somewhat faster
than chewing off smallest factors while counting up
from 1. And once the factors get big enough the sieve-
based method remains practical (to some further point),
while the counting-by-1 approach dies, simply because
it has to go "all the way" before figuring out it's
dealing with a prime.
To profile, I use the function below, which takes the
function to test, and its argument, as its own arguments:
import time
def profile(f,arg):
a = time.time()
print f(arg)
b = time.time()
return b-a
>>> profile(factor,20034321)
[3, 79, 84533]
0.060000061988830566
is sieve-based. Then I try Michele's (original version):
>>> profile(primeNumber,20034321)
[3, 79, 84533]
0.1099998950958252
Somewhat slower.
However, the sieve-based version isn't so pretty to look
at. Likely this could be simplified. It's not recursive:
import math
from primes import sieve
def factor(n):
plist = sieve(math.sqrt(n))
pfacts = []
ldiv=1
while 1:
if ldiv:
if n==1: break
else: ldiv=0
for i in plist:
if n%i==0:
ldiv=1
n = n/i
pfacts.append(i)
if not ldiv:
pfacts.append(n)
break
return pfacts
The basic idea is it lists primes up to the sqrt of n,
and if none of those divide, no divisor is found, and
n itself is considered prime. But in most cases, you
get into a loop and break out from the top when n==1.
But on top of all that, you still need the sieve. I
included an efficient one a few posts back, and that's
what I'm importing from primes.py.
E.g. when I ask for the factors of 211, the sieve returns
primes only up to sqrt(211) i.e. [2, 3, 5, 7, 11, 13].
Since all those fail to divide 211 evenly, we return
out the bottom, appending 211 itself as a prime factor.
The sieve itself has another sqrt in it, i.e. since we
only need primes up to 13, it's only going to test divide
by 2 and 3. Actually, it begins with odds 3 5 7 9 11 13
with 2 already known to be prime. After eliminating the
9 (the first 3 is kept), we're done.
This is all quite efficient. Let's try a much bigger
number:
>>> profile(factor,2133431113212)
[2, 3, 7, 17, 2, 3, 3, 165999931L]
4.2300000190734863
The sieve-based function runs in under 5 seconds (depends on
processor of course), whereas the non sieve-based function
isPrimenumber gets stuck counting to that last prime factor
by 1, which takes a long time. Just try against that last
factor (165999931) to see why one might want to sieve:
>>> profile(primeNumber,165999931) # ouch
[165999931]
205.37000000476837
>>> profile(factor,165999931) # sometimes 0.0
[165999931]
0.059999942779541016
There's a method called Miller-Rabin which generates
very large numbers very likely to be prime. These are
so far not practical to crack (factor), which is why we
have a "trap door" function -- something we can do, but
not undo.
The RSA algorithm takes two of these huge probable primes
and multiplies them together for a public key. Even
though the key is public, no known techniques let us
factor such a number in reasonable time. If a public
key could be factored, the code could be broken (people
used to think 512-bit public keys were safe, but a team
cracked a challenge public key of that size, so now
everyone says to go 1024 bits at least -- note that
every bit *doubles* the keyspace, so doubling the key
size is way more than doubling the difficulty of
factoring).
Raising to powers modulo N is a cyclic thing -- you keep
rotating through the same numbers. Knowing the factors
of N and the power the message had been raised to (which
is also public) would tell you how to raise the ciphertext
to a countering power that'd rotate it back through to
the message itself, one power after message^0 = 1 on the
dial. Not knowing the factors, the dial is just too big
around to try every power-position.
Here's a 50-digit probable prime:
>>> from primes import bigppr
>>> bigppr(50)
Working...
Percent chance of being prime: 100.0
Elapsed time: 0.399939657051 seconds
757795979917279974119994023697095922339387199771439L
That's out of range for my sieve-based program, to
check for factors (most likely 1, and the above are
the only two).
Kirby