[SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Jan 12 22:28:38 EST 2014


nobs = 5
rr = 2

We have 5 balls and 2 bins.
We can put as many balls as we want (0 to 5) in the first bin.
Then we put as many balls as we want (and are available) in the second bin.
And so on if we had more bins.

I'd like to enumerate all possible bin allocations, in this case 21:

>>> m_all
array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [0, 5],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 0],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2],
       [4, 0],
       [4, 1],
       [5, 0]])


Here is a simple way to caculate it. The number of cases gets large very fast.
-----------
import itertools
import numpy as np

nobs = 10
rr = 3
m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs,
                                 itertools.product(*[range(nobs + 1)]*rr))))
print m_all.shape
print np.bincount(m_all.sum(1))
-----------

Is there a faster way to do this?

I need to do some calculation on each case, so either a numpy array or
an iterator works.



The motivation:

We have two kinds of machines, 10 each that we want to test for failing early.
The test has the null hypothesis that the failure distributions are the same.
Against a one-sided alternative that the first kind of machines fails earlier.

We run the 20 machines and count how many machines of the first kind
has failed by the time that we see the rth (3rd) failure of the second
kind.
If a large number of the machines of the first kind have failed, then
we reject the Null that they have the same failure distribution.

We don't have time to wait until all machines fail. So we need to do
"very small" sample statistics.

Wilcoxon-type precedence tests.


Josef
It's Big Data, if we have big machines.



More information about the SciPy-User mailing list