3-arg float pow()
Markus Schaber
markus at schabi.de
Mon Sep 3 13:34:55 EDT 2001
Hi,
David C. Ullrich <ullrich at math.okstate.edu> schrub:
> Seems like pow(3., 500., 7.) can't possibly do anything except just
> 3.**500. % 7(???), hence the random nature of the result and the utter
> uselessness.
The trick is that it is a exponential function with large result modulo
a small number. (such operations are e. G. used in the RSA crypto
algorithm)
When calculating the exponential function using software emulation (as
I know of no decent hardware support for long Integers), you have more
or less a loop of multiplications or so (depending how optimized your
algorithm is).
One optimized algorithm is to split the 500 into powers of two, so
3**500 gets to 3**(256 + 128 + 64 + 32 + 16 + 4) == (3**256) * (3**128)
* (3**64) * (3**32) * (3**16) * (3**4). As you see, you can generate
the bracketed values by repeatedly multiplying the number with itsself
(which is rather fast) and then multiplying all those values. In a
computer, you can quite optimize this, as the 500 is already given in
binary. Thus one can iterate like the following:
def simple(base, exp, modulo):
res = 1L
while exp != 0L:
if exp & 1L:
res *= base
exp >>=1L
base *= base
return res%modulo
This gives 9 rounds with two comparisons, one shift and maximum two
multiplications per loop, thus a maximum of 45 operations compared to
500 when using simple plain loop.
The problem is when the intermetiate number gets rather large (which is
the case here, we get 3**500 ==
36360291795869936842385267079543319118023385026001623040346035832580600191583895484198508262979388783308179702534403855752855931517013066142992430916562025780021771247847643450125342836565813209972590371590152578728008385990139795377610001).
Using dynamically sized numbers (as our long integers are), this can
get rather big (and thus memory- and cpu-intensive) compared to the
final result 2.
But as we calculate modulo 7, we can apply the modulo after every
single step of shifting, and so keep the numbers small (which
is much faster given some numbers, and uses much less memory)
def optimized(base, exp, modulo):
res = 1L
while exp != 0L:
if exp & 1L:
res *= base
res %= modulo
exp >>=1L
base *= base
base %= modulo
return res
Especially when calculating lots of numbers (such as applying this to a
list or matrix), this can save some megabyte intermediate storage.
A test like
start = time.clock()
for i in xrange(1000):
void = simple(3L,5000L,7L)
stend = time.clock()
print 'simple:',round(stend-start,2)
gives 7.28 seconds for the simple and 0.24 seconds for the
modulo-optimized version on my machine.
On the other side this doesn't apply when using small integers (as they
don't expand in size), and doesn't gain much when the exponental result
and the modulo factor are about the same size. E. G. using 50 as exp in
the example above gives 0.09 seconds in the simple and 0.13 seconds in
the optimized version due to the additional modulo operations.
Additionally, the second algorithm can be rewritten to calculate the
result using ordinary integers (simply remove all the "L"s in the def
and at the call), and this way only needs 0.05 seconds, whereas the
first algorithm gives an OverflowError. A math library coder could use
some heuristics to dynamically decide which algorithm to use.
I ran an additional test, calculating pow(3,5000000,7), and the
unoptimized version (all Longs) needs about 50 seconds for _one_ loop,
whereas the optimized version needs for _1000_ loops 0.35 seconds using
longs, and 0.13 seconds using integers. Btw, the builtin pow function
needs 0.15 and 0.01 seconds
But I don't know whether this trick is usable in floating point, as
most floating point is done in hardware nowadays. I can imagine that
some FPU have machine commands to gain additional speed and maybe
exactness this way.
Another point may be that when processing arrays or matrices, a
sophisticated programmer may use MMX, 3DNow!, ISSE or AltiVec to gain
speed by processing something in parallel, but I'm afraid this would be
not so easy to incorporate in python, because there numbers are stored
as references to objects.
PS: I'm away for a week from tomorrow on, so please be patient with my
answers to your comments / questions.
markus
--
"The strength of the Constitution lies entirely in the determination of
each citizen to defend it. Only if every single citizen feels duty
bound to do his share in this defense are the constitutional rights
secure." -- Albert Einstein
More information about the Python-list
mailing list