[Matrix-SIG] Re: combinations (summary)

Czerminski, Ryszard ryszard@arqule.com
Wed, 29 Sep 1999 14:08:07 -0400


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01BF0AA5.94F84666
Content-Type: text/plain;
	charset="iso-8859-1"

I would like to express my appreciation to the people
who took time to help me with combinations problem:

Tim Peters   - for Python code which does EXACTLY what I wanted,
               for advise and for pointers to GNU's GMP package

Mirek Gorski - for pointer to GNU's GMP package
               http://www.math.iastate.edu/cbergman/crypto/gmp/gmp_toc.html

Pat Walters  - for C++ class for generating combinations

Emili Besalu - for pointer to his web page dealing with similar
               problems http://iqc.udg.es/~emili/emili_nc.htm

Thank you all very much! -- Ryszard

ORIGINAL QUESTION:
 
I am looking for a function which could compute
combinations given lexicographical index e.g.

C(1;3,2) -> 1,2
C(2;3,2) -> 1,3
C(3;3,2) -> 2,3

I have found function which does this
( http://math.nist.gov/GAMS.html ) but it is limited
to small numbers since it is using regular 4 bytes
representation for integers and therefore index
range is severly limited ( < 2^32 ).

Any pointers to the software which does this for
integers of arbitrary length would be very much
appreciated.


Ryszard Czerminski   phone: (781)395-1269 x 479
ArQule, Inc.         e-mail: ryszard@arqule.com
200 Boston Avenue    http://www.arqule.com
Medford, MA 02155

P.S.

from Tim Peters:

> I am looking for a function which could compute
> combinations given lexicographical index e.g.
>
> C(1;3,2) -> 1,2
> C(2;3,2) -> 1,3
> C(3;3,2) -> 2,3
>
> I have found function which does this
> ( http://math.nist.gov/GAMS.html ) but it is limited
> to small numbers since it is using regular 4 bytes
> representation for integers and therefore index
> range is severly limited ( < 2^32 ).

As is often the case, relaxing such limits creates other problems; in this
case, the Fortran TOMS 515 algorithm recomputes the number of combinations
from scratch each time thru the inner loop, which is very expensive if
choosing many elements from a very large set.

> Any pointers to the software which does this for
> integers of arbitrary length would be very much
> appreciated.

One is attached.  Note that since this is Python instead of Fortran,
everything is 0-based.  That is, when asking for the i'th combination of m
things taken n at a time, i ranges from 0 up to but not including C(m, n),
and the "canonical set" is taken to be range(m).  The inner loop avoids the
problem above, so this is reasonably fast even for multi-hundred digit
indices.

more-combinations-than-a-reasonable-person-could-want<wink>-ly y'rs  - tim

def _chop(n):
    """n -> int if it fits, else long."""

    try:
        return int(n)
    except OverflowError:
        return n

def comb(m, n):
    """m, n -> number of combinations of m items, n at a time.

    m >= n >= 0 required.
    """

    if not m >= n >= 0:
        raise ValueError("m >= n >= 0 required: " + `m, n`)
    if n > (m >> 1):
        n = m-n
    if n == 0:
        return 1
    result = long(m)
    i = 2
    m, n = m-1, n-1
    while n:
        # assert (result * m) % i == 0
        result = result * m / i
        i = i+1
        n = n-1
        m = m-1
    return _chop(result)

def combatindex(m, n, i):
    """m, n, i -> i'th combination of m items taken n at a time.

    m >= n >= 1 and 0 <= i < comb(m, n) required.

    Return the i'th combination in lexicographic order, as a list
    of n elements taken from range(m).
    The index (i) is 0-based.

    Example:
    >>> for i in range(6):
    ...    print combatindex(4, 2, i)
    [0, 1]
    [0, 2]
    [0, 3]
    [1, 2]
    [1, 3]
    [2, 3]
    """

    if not m >= n >= 1:
        raise ValueError("m >= n >= 1 required: " + `m, n`)
    c = long(comb(m, n))
    if not 0 <= i < c:
        raise ValueError("0 <= i < comb(m,n) required: " + `i, c`)
    result = []
    # have c == comb(m, n), want comb(m-1,n-1)
    c = c * n / m
    # invariant: c == comb(m-1, n-1)
    for element in xrange(m):
        if i < c:
            # take this element, and n-1 from the remaining
            result.append(element)
            n = n-1
            if n == 0:
                break
            # have c == comb(m-1,n), want comb(m-2,n-1)
            c = c * n / (m-1)
        else:
            # skip this element, and take all from the remaining
            i = i-c
            # have c == comb(m-1,n-1), want comb(m-2,n-1)
            c = c * (m-n) / (m-1)
        m = m-1
    assert i == 0
    return result

def _test():
    failures = 0
    m, n = 10, 6
    c = comb(m, n)
    last = [0] * n
    for i in xrange(c):
        this = combatindex(m, n, i)
        if len(this) != n:
            print "*** length error:", m, n, i, this
            failures = failures + 1
        if not last < this:
            print "*** lexicographic error:", m, n, i, last, this
            failures = failures + 1
        last = this
    if this != range(m-n, m):
        print "*** last value wrong:", m, n, c-1, this
        failures = failures + 1
    # c has more than 1400 digits in the next case -- may take a
    # few seconds
    m, n = 5000, 2000
    c = comb(m, n)
    this = combatindex(m, n, 0)
    if this != range(n):
        print "*** first value wrong:", m, n, 0, this
        failures = failures + 1
    this = combatindex(m, n, c-1)
    if this != range(m-n, m):
        print "*** last value wrong:", m, n, c-1, this
        failures = failures + 1
    if failures:
        print "*********** TEST FAILED *************"

if __name__ == "__main__":
    _test()

=========================================================

from Pat Walters:



------_=_NextPart_000_01BF0AA5.94F84666
Content-Type: application/octet-stream;
	name="combo.cpp"
Content-Disposition: attachment;
	filename="combo.cpp"

#include <vector>
#include <algorithm>
#include <iostream.h>
#include <stdio.h>

using namespace std;

#include "combo.h"


// Combination generator class
// Takes a vector of integers as input and generates
// all combinations
/*
int main(int argc, char *argv[])
{
  vector<int> b(3);

  b[0] = 2;
  b[1] = 3;
  b[2] = 2;
  
  combo c(b);
  while (!c.done())
    {
      print_v(c.get_current());
      c.next();
    }
  return(0);
}

Produces

   0    0    0
   0    0    1
   0    1    0
   0    1    1
   0    2    0
   0    2    1
   1    0    0
   1    0    1
   1    1    0
   1    1    1
   1    2    0
   1    2    1
*/

// support function to print a vector
void print_v(const vector <int> &v)
{
	for (int i = 0; i < v.size(); i++)
		printf("%4d ",v[i]);
	printf("\n");
	fflush(stdout);
}

// constructor - takes a vector of ints
// each int represents the maximum number of possibilities for
// that position.
combo::combo(const vector <int> &x)
{
	enough = false;
	high.resize(x.size());
	curr.resize(x.size());
	fill (curr.begin(),curr.end(),0);
	for ( int i = 0; i < x.size(); i++)
		high[i] = x[i] - 1;
	len = high.size();
	limit = len-1;
}

// debuging function to show a combination
void combo::show()
{
	for (int i = 0; i < curr.size(); i++)
		printf("%4d ",curr[i]);
	printf("\n");
	fflush(stdout);
}

// generate the next combination
void combo::next()
{
	bool raised;
	
	if (curr[limit]<high[limit])
		curr[limit]++;
	else
    {
		curr[limit]=0;
		limit--;
		raised=false;
		while (!raised && !enough)
		{
			if (curr[limit]<high[limit])
			{
				curr[limit]++;
				raised=true;
				limit=len;
			}
			else
				if (limit==-1)
					enough=true;
				else
					curr[limit]=0;
				limit--;
		}
    }
}







------_=_NextPart_000_01BF0AA5.94F84666
Content-Type: application/octet-stream;
	name="combo.h"
Content-Disposition: attachment;
	filename="combo.h"

class combo
{
  private :
  vector<int> high;
  bool enough;
  int len;
  int limit;
  vector <int> curr;

  public :
  combo(const vector <int> &high);
  void next();
  vector <int> &get_current() {return(curr);};
  bool done() {return(enough);}
  void show();
};

void print_v(const vector <int> &v);


------_=_NextPart_000_01BF0AA5.94F84666--