[Numpy-discussion] align `choices` and `sample` with Python `random` module

Warren Weckesser warren.weckesser at gmail.com
Tue Dec 11 17:10:55 EST 2018


On Tue, Dec 11, 2018 at 2:27 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Dec 11, 2018 at 10:39 AM Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
>> There is no bug, just a limitation in the API.
>>
>> When I draw without replacement, say, three values from a collection of
>> length five, the three values that I get are not independent.  So really,
>> this is *one* sample from a three-dimensional (discrete-valued)
>> distribution.  The problem with the current API is that I can't get
>> multiple samples from this three-dimensional distribution in one call.  If
>> I need to repeat the process six times, I have to use a loop, e.g.:
>>
>>     >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False,
>> size=3) for _ in range(6)]
>>
>> With the `select` function I described in my previous email, which I'll
>> call `random_select` here, the parameter that determines the number of
>> items per sample, `nsample`, is separate from the parameter that determines
>> the number of samples, `size`:
>>
>>     >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6)
>>     >>> samples
>>     array([[30, 40, 50],
>>            [40, 50, 30],
>>            [10, 20, 40],
>>            [20, 30, 50],
>>            [40, 20, 50],
>>            [20, 10, 30]])
>>
>> (`select` is a really bad name, since `numpy.select` already exists and
>> is something completely different.  I had the longer name `random.select`
>> in mind when I started using it. "There are only two hard problems..." etc.)
>>
>> Warren
>>
>
> This is an issue for the probability distributions from scipy.stats, too.
>
> The only library that I know handles this well is TensorFlow Probability,
> which has a notion of "batch" vs "events" dimensions in distributions. It's
> actually pretty comprehensive, and makes it easy to express these sorts of
> operations:
>
> >>> import tensorflow_probability as tfp
> >>> import tensorflow as tf
> >>> tf.enable_eager_execution()
> >>> dist = tfp.distributions.Categorical(tf.zeros((3, 5)))
> >>> dist
> <tfp.distributions.Categorical 'Categorical/' batch_shape=(3,)
> event_shape=() dtype=int32>
> >>> dist.sample(6)
> <tf.Tensor: id=299, shape=(6, 3), dtype=int32, numpy= array([[1, 2, 1],
> [2, 1, 3], [4, 4, 2], [0, 1, 1], [0, 2, 2], [2, 0, 4]], dtype=int32)>
>


Yes, tensorflow-probability includes broadcasting of the parameters and
generating multiple variates in one call, but note that your example is not
sampling without replacement.  For sampling 3 items without replacement
from a population, the *event_shape* (to use tensorflow-probability
terminology) would have to be (3,).

Warren


_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181211/0480be1f/attachment.html>


More information about the NumPy-Discussion mailing list