[SciPy-User] vectorized version of 'multinomial' sampling function

Wed Oct 14 23:05:34 EDT 2009

<josef.pktd at gmail.com> wrote:

>>> If you only want n=1, then it seems to me, you could do this also with
>>> drawing an integer out of range(3) with the given probabilities. This
>>> can be vectorized easily directly with numpy. (just like an individual
>>> observation in multinomial logit)
>>>
>>> Josef
>>>
>>>>
>>>> to get a vector of multinomial samplers, each using the nth list in
>>>> 'p'. something like:
>>>>
>>>> array([[1, 0, 0], [0, 0 1]]) in this case. is this possible? it seems
>>>> like 'multinomial' takes only a one dimensional array. i could write
>>>> this as a "for" loop of course but i prefer a vectorized version since
>>>> speed is crucial for me here.
>>
>>
>> for n=1 the following should work (unless I still cannot tell
>> multinomial distribution and multinomial logit apart)
>>
>> Josef
>>
>>>>> p = np.array([[ 0.9 ,  0.05,  0.05],
>>       [ 0.05,  0.05,  0.9 ]])
>>>>> pcum = p.cumsum(1)
>>>>> pcuma = np.repeat(pcum,20, 0)
>>>>> rvs = np.random.uniform(size=(40))
>>>>> a,b,c = pcuma.T
>>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>>> mnrvs[:20].sum()
>> 20
>>>>> mnrvs[:20].sum(0)
>> array([19,  0,  1])
>>>>> mnrvs[20:].sum()
>> 20
>>>>> mnrvs[20:].sum(0)
>> array([ 1,  0, 19])
>>>>> mnrvs[15:25]
>> array([[0, 0, 1],
>>       [1, 0, 0],
>>       [1, 0, 0],
>>       [1, 0, 0],
>>       [1, 0, 0],
>>       [0, 0, 1],
>>       [0, 0, 1],
>>       [0, 0, 1],
>>       [1, 0, 0],
>>       [0, 0, 1]])
>>
>>
>> check for larger sample:
>>
>>>>> pcuma = np.repeat(pcum,500, 0)
>>>>> rvs = np.random.uniform(size=(1000))
>>>>> a,b,c = pcuma.T
>>>>> mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)
>>>>> mnrvs[:500].sum(0)
>> array([456,  17,  27])
>>>>> mnrvs[500:].sum(0)
>> array([ 24,  35, 441])
>>>>> p*500
>> array([[ 450.,   25.,   25.],
>>       [  25.,   25.,  450.]])
>>
>
> similar works for e.g. n=5
>
> Josef

hi Josef,

thanks very much for your reply. i see how this works for the specific
case where the multinomial vector of probabilities has 3 numbers in
it, for which you defined the corresponding variables "a", "b" and
"c". but is there a way to rewrite this code to make it work for an
arbitrarily sized vector of probabilities? for example, i'm not sure
how one would generalize this line:

mnrvs = np.column_stack((rvs<=a, (rvs>a) & (rvs<=b), rvs>b)).astype(int)

any ideas on how to do this would be greatly appreciated.  thanks again.