Proving optimized function is still correct

Tue Nov 13 00:46:48 EST 2001

<salvasan at yahoo.com> wrote in message
news:9sq46c$o93$1 at news.carib-link.net...
> I gave a colleague of mine a Python function to optimise:

Responding would have been easier if you had given a succinct
description of what your function computes:  the number of ways of
distributing n indistinguishable items among m labelled bins, with
bins possibly left empty, and where 'labelled' means that each
permutation among bins is counted separately.

> def count_combos( m, n, trial=[] ):
>     #is trial complete?
>     if len(trial) == m:
>         #check if trial is valid
>         total = 0
>         for x in trial:
>             total = total + x
>         if total == n:
>             #count this valid trial
>             return 1
>         #otherwise do not count it
>         return 0
>     else:
>         #RECURSIVE CALL
>         count = 0
>         for i in range( n+1 ):
>              #extend trial by another test element
>              trial.append(i)
>              #total up valid trials found
>              count = count + count_combos( m, n, trial )
>              #remove last appended element
>              del( trial[-1] )
>         return count

The above generates the (n+1)**m ways of putting 0..n items in each
bin independently and then counts those for which the separate numbers
add up to n.  This brute-force method is about maximally inefficient.
The following is a more direct recursion:

def d(m,n):
  if m == 1 or n == 0: return 1
  cnt = 0
  for i in range(n+1): cnt = cnt + d(m-1,n-i) # ie, put i items in bin
m and distribute rest
  return cnt

>>> d(6,4)
126

> The preconditions are that arguments m and n are always positive
integers.

No, n may be 0, as shown by inspection of your code (and consideration
of the problem).

> She amazingly reduced the above code to the following:

No, she tossed it out and instead calculates the well-known formula
that solves this standard combinatorial problem.

> def count_combos_optimised( m, n ):
>     numer = 1
>     denom = 1
>     j = m+n-1
>     i = 1
>     while i<m:
>         numer = numer * j
>         denom = denom * i
>         j = j - 1
>         i = i + 1
>     return numer/denom

This calculates (n+m-1)!/n!(m-1)!.  Example: 9*8*7*6*5 / 5*4*3*2*1 =
126.

> The optimised version operates a lot faster, but how can I be sure
> that count_combos_optimised( m, n ) and count_combos( m, n ) will
> return the same values for every possible pairing of positive
integers
> (m,n) ?

The standard method would be mathematical induction: substitute the
factorial formula into the recursive formula embodied in d() above and
show equalilty (ugh).  Fortunately, in this case, there is a direct
argument.  Line up n+m-1 items, bounded on each end by a boundary
marker.  Select m-1 and turn them into boundary markers.  This leave n
items distributed among k distinguishable (by order) bins.  There are
(n+m-1)!/n!(m-1)! ways of doing this.

Terry J. Reedy