[SciPy-User] Semi-OT: weekend puzzle enumerating number of balls in bins

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Jan 14 19:33:27 EST 2014


On Sun, Jan 12, 2014 at 10:28 PM,  <josef.pktd at gmail.com> wrote:
> nobs = 5
> rr = 2
>
> We have 5 balls and 2 bins.
> We can put as many balls as we want (0 to 5) in the first bin.
> Then we put as many balls as we want (and are available) in the second bin.
> And so on if we had more bins.
>
> I'd like to enumerate all possible bin allocations, in this case 21:
>
>>>> m_all
> array([[0, 0],
>        [0, 1],
>        [0, 2],
>        [0, 3],
>        [0, 4],
>        [0, 5],
>        [1, 0],
>        [1, 1],
>        [1, 2],
>        [1, 3],
>        [1, 4],
>        [2, 0],
>        [2, 1],
>        [2, 2],
>        [2, 3],
>        [3, 0],
>        [3, 1],
>        [3, 2],
>        [4, 0],
>        [4, 1],
>        [5, 0]])
>
>
> Here is a simple way to caculate it. The number of cases gets large very fast.
> -----------
> import itertools
> import numpy as np
>
> nobs = 10
> rr = 3
> m_all = np.array(list(itertools.ifilter(lambda ii: sum(ii)<=nobs,
>                                  itertools.product(*[range(nobs + 1)]*rr))))
> print m_all.shape
> print np.bincount(m_all.sum(1))
> -----------
>
> Is there a faster way to do this?
>
> I need to do some calculation on each case, so either a numpy array or
> an iterator works.
>
>
>
> The motivation:
>
> We have two kinds of machines, 10 each that we want to test for failing early.
> The test has the null hypothesis that the failure distributions are the same.
> Against a one-sided alternative that the first kind of machines fails earlier.
>
> We run the 20 machines and count how many machines of the first kind
> has failed by the time that we see the rth (3rd) failure of the second
> kind.
> If a large number of the machines of the first kind have failed, then
> we reject the Null that they have the same failure distribution.

I think this sentence is wrong, we reject when the test statistic is small.

>
> We don't have time to wait until all machines fail. So we need to do
> "very small" sample statistics.
>
> Wilcoxon-type precedence tests.

In case anyone is interested, I pushed a first complete version
https://github.com/josef-pkt/statsmodels/compare/precedence#diff-459f12a0fad805d03bf9b3d78cb5ee90R218
It's almost a multivariate distribution class, except that we are only
interested in the distributions of a few real functions on it.

It contains Oleksandr's solution, I haven't tried yet Alan's updated solution.

Josef

>
>
> Josef
> It's Big Data, if we have big machines.



More information about the SciPy-User mailing list