[Numpy-discussion] align `choices` and `sample` with Python `random` module

Stephan Hoyer shoyer at gmail.com
Tue Dec 11 14:26:34 EST 2018


On Tue, Dec 11, 2018 at 10:39 AM Warren Weckesser <
warren.weckesser at gmail.com> wrote:

> There is no bug, just a limitation in the API.
>
> When I draw without replacement, say, three values from a collection of
> length five, the three values that I get are not independent.  So really,
> this is *one* sample from a three-dimensional (discrete-valued)
> distribution.  The problem with the current API is that I can't get
> multiple samples from this three-dimensional distribution in one call.  If
> I need to repeat the process six times, I have to use a loop, e.g.:
>
>     >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False,
> size=3) for _ in range(6)]
>
> With the `select` function I described in my previous email, which I'll
> call `random_select` here, the parameter that determines the number of
> items per sample, `nsample`, is separate from the parameter that determines
> the number of samples, `size`:
>
>     >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6)
>     >>> samples
>     array([[30, 40, 50],
>            [40, 50, 30],
>            [10, 20, 40],
>            [20, 30, 50],
>            [40, 20, 50],
>            [20, 10, 30]])
>
> (`select` is a really bad name, since `numpy.select` already exists and is
> something completely different.  I had the longer name `random.select` in
> mind when I started using it. "There are only two hard problems..." etc.)
>
> Warren
>

This is an issue for the probability distributions from scipy.stats, too.

The only library that I know handles this well is TensorFlow Probability,
which has a notion of "batch" vs "events" dimensions in distributions. It's
actually pretty comprehensive, and makes it easy to express these sorts of
operations:

>>> import tensorflow_probability as tfp
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> dist = tfp.distributions.Categorical(tf.zeros((3, 5)))
>>> dist
<tfp.distributions.Categorical 'Categorical/' batch_shape=(3,)
event_shape=() dtype=int32>
>>> dist.sample(6)
<tf.Tensor: id=299, shape=(6, 3), dtype=int32, numpy= array([[1, 2, 1], [2,
1, 3], [4, 4, 2], [0, 1, 1], [0, 2, 2], [2, 0, 4]], dtype=int32)>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181211/f2972560/attachment.html>


More information about the NumPy-Discussion mailing list