Proving optimized function is still correct
Fernando Pérez
fperez528 at yahoo.com
Sun Nov 11 17:10:09 EST 2001
>
> The preconditions are that arguments m and n are always positive integers.
> She amazingly reduced the above code to the following:
>
> def count_combos_optimised( m, n ):
> numer = 1
> denom = 1
> j = m+n-1
> i = 1
> while i<m:
> numer = numer * j
> denom = denom * i
> j = j - 1
> i = i + 1
> return numer/denom
>
Her 'amazing' reduction is possible simply because it's a very well known
formula of combinatoric analysis. By the way, you should use long integers so
that the factorials don't blow up in your face for large numbers.
You got me thinking. Below is a quick and dirty implementation of the same
using Numerical Pyhton and some routines adapted from Numerical Recipes in C
(which means they're not the most optimized in the world by any stretch of
the imagination). I also added a copy of yours fixed to use longs:
#--------------------------------- begin file
"""Quick and dirty factorials and combinatorics.
Some stuff copied from NumRecipes."""
# fix to use longs instead of ints
def count_combos_optimised( m, n ):
numer = 1L
denom = 1L
j = m+n-1L
i = 1L
while i<m:
numer = numer * j
denom = denom * i
j = j - 1
i = i + 1
return numer/denom
from math import log,exp,floor
import Numeric
def gammln(x):
"""Return the natural log of gamma(x)"""
N = Numeric
cof = N.array([76.18009172947146,-86.50532032941677,
24.01409824083091,-1.231739572450155,
0.1208650973866179e-2,-0.5395239384953e-5],N.Float)
y = N.arange(x+1,x+7,typecode=N.Float)
tmp = x+5.5
tmp -= (x+0.5)*log(tmp)
ser = 1.000000000190015 + N.sum(cof/y)
return -tmp+log(2.5066282746310005*ser/x)
def bico(n,k):
"""Return binomial coefficient (n k)"""
return int(floor(0.5+exp(gammln(n+1.0)-gammln(k+1.0)-gammln(n-k+1.0))))
def combi_dist(m,n):
"""Return the number of combinations of m mutually distinguishable
elements in which each element may occur 0,1,2,...,n times in any
combination."""
# this is just bico(m+n-1,n). Write it out explicitly for speed
nn = m+n-1
kk = n
return floor(0.5+exp(gammln(nn+1.0)-gammln(kk+1.0)-gammln(nn-kk+1.0)))
#--------------------------------- end file
Your colleague's algorithm is very fast for small m, but dies badly for large
m. Basically hers is O(m) in execution time (roughly), while my (well,
NumRecipes') version is roughly constant in execution time (indep. of m,n).
Here's some timings of hers vs mine (yes, that's a python prompt, times in
seconds):
For large m:
In[64]:= m,n,reps
Out[64]= (3000, 20, 100)
In[65]:= time_test reps,combi_dist ,m,n
-------> time_test (reps,combi_dist ,m,n)
Out[65]= 0.069999999999993179
In[66]:= time_test reps,count_combos_optimised ,m,n
-------> time_test (reps,count_combos_optimised ,m,n)
Out[66]= 17.5
And for small m:
In[67]:= m=30
In[68]:= time_test reps,combi_dist ,m,n
-------> time_test (reps,combi_dist ,m,n)
Out[68]= 0.07000000000000739
In[69]:= time_test reps,count_combos_optimised ,m,n
-------> time_test (reps,count_combos_optimised ,m,n)
Out[69]= 0.030000000000001137
So, study your problem domain and decide whether you'll need small or large
m. If you'll need both and really need speed, blend the two implementations
into one with a branch cutoff (you'll have to play a bit to determine the
threshold).
One advantage of your colleague's version: b/c it uses longs, it will produce
exact results always. Mine is ultimately limited to things that fit into a
float, so it will blow up in extreme cases.
Hope this helps,
F.
PS. As for your original question, yes, her solution is correct ;)
More information about the Python-list
mailing list