[Numpy-discussion] NEP: Random Number Generator Policy

Sun Jun 10 23:10:16 EDT 2018

On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
> >> I suspect many of the tests will be easy to update, so fixing 300 or so
> tests does not seem like a monumental task.
> >
> > It's all not monumental, but it adds up quickly. In addition to changing
> tests, one will also need compatibility code when supporting multiple numpy
> versions (e.g. scipy when get a copy of RandomStable in
> scipy/_lib/_numpy_compat.py).
> >
> > A quick count of just np.random.seed occurrences with ``$ grep -roh
> --include \*.py np.random.seed . | wc -w`` for some packages:
> > numpy: 77
> > scipy: 462
> > matplotlib: 204
> > statsmodels: 461
> > pymc3: 36
> > scikit-image: 63
> > scikit-learn: 69
> > keras: 46
> > pytorch: 0
> > tensorflow: 368
> > astropy: 24
> >
> > And note, these are *not* incorrect/broken usages, this is code that
> works and has done so for years.
>
> Yes, some of them are incorrect and broken. Failure can be difficult to
> detect. This module from keras is particularly problematic:
>
>   https://github.com/keras-team/keras-preprocessing/blob/
> master/keras_preprocessing/image.py
>

You have to appreciate that we're not all thinking at lightning speed and
in the same direction. If there is a difficult to detect problem, it may be
useful to give a brief code example (or even line of reasoning) of how this
actually breaks something.

>
> > Conclusion: the current proposal will cause work for the vast majority
> of libraries that depends on numpy. The total amount of that work will
> certainly not be counted in person-days/weeks, and more likely in years
> than months. So I'm not convinced yet that the current proposal is the best
> way forward.
>
> The mere usage of np.random.seed() doesn't imply that these packages
> actually require stream-compatibility. Some might, for sure, like where
> they are used in the unit tests, but that's not what you counted. At best,
> these numbers just mean that we can't eliminate np.random.seed() in a new
> system right away.
>

Well, mere usage has been called an antipattern (also on your behalf), plus
for scipy over half of the usages do give test failures (Warren's quick
test). So I'd say that counting usages is a decent proxy for the work that
has to be done.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180610/d40d3763/attachment.html>