[Numpy-discussion] NEP: Random Number Generator Policy

Mon Jun 4 00:23:23 EDT 2018

On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sun, Jun 3, 2018 at 6:54 PM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>>>> tests:
>>>>>>
>>>>>>     * ``.randint()``
>>>>>>     * ``.uniform()``
>>>>>>     * ``.normal()``
>>>>>>     * ``.standard_normal()``
>>>>>>     * ``.choice()``
>>>>>>     * ``.shuffle()``
>>>>>>     * ``.permutation()``
>>>>>>
>>>>>
>>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>>>> @bashtage writes:
>>>>> > standard_gamma and standard_exponential are important enough to be
>>>>> included here IMO.
>>>>>
>>>>> "Importance" was not my criterion, only whether they are used in unit
>>>>> test suites. This list was just off the top of my head for methods that I
>>>>> think were actually used in test suites, so I'd be happy to be shown live
>>>>> tests that use other methods. I'd like to be a *little* conservative about
>>>>> what methods we stick in here, but we don't have to be *too* conservative,
>>>>> since we are explicitly never going to be modifying these.
>>>>>
>>>>
>>>> That's one area where I thought the selection is too narrow.
>>>> We should be able to get a stable stream from the uniform for some
>>>> distributions.
>>>>
>>>> However, according to the Wikipedia description Poisson doesn't look
>>>> easy. I just wrote a unit test for statsmodels using Poisson random numbers
>>>> with hard coded numbers for the regression tests.
>>>>
>>>
>>> I'd really rather people do this than use StableRandom; this is best
>>> practice, as I see it, if your tests involve making precise comparisons to
>>> expected results.
>>>
>>
>> I hardcoded the results not the random data. So the unit tests rely on a
>> reproducible stream of Poisson random numbers.
>> I don't want to save 500 (100 or 1000) observations in a csv file for
>> every variation of the unit test that I run.
>>
>
> I agree, hardcoding numbers in every place where seeded random numbers are
> now used is quite unrealistic.
>
> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>
>

As a quick and imperfect test, I monkey-patched numpy so that a call to
numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests
using the `runtests.py` script:

*seed+1000, using 'python runtests.py -n' in the source directory:*

  236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed

Most of the failures are in scipy.stats:

*seed+1000, using 'python runtests.py -n -s stats' in the source directory:*

  203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed

Changing the amount added to the seed or running the tests using the
function `scipy.test("full")` gives different (but similar magnitude)
results:

*seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:*

  269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed

*seed+1, using 'python runtests.py -n' in the source directory:*

  305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
xpassed

I suspect many of the tests will be easy to update, so fixing 300 or so
tests does not seem like a monumental task.  I haven't looked into why
there are 585 deselected tests; maybe there are many more tests lurking
there that will have to be updated.

Warren

Ralf
>
>
>
>>
>>
>>>
>>> StableRandom is intended as a crutch so that the pain of moving existing
>>> unit tests away from the deprecated RandomState is less onerous. I'd really
>>> rather people write better unit tests!
>>>
>>> In particular, I do not want to add any of the integer-domain
>>> distributions (aside from shuffle/permutation/choice) as these are the ones
>>> that have the platform-dependency issues with respect to 32/64-bit `long`
>>> integers. They'd be unreliable for unit tests even if we kept them stable
>>> over time.
>>>
>>>
>>>> I'm not sure which other distributions are common enough and not easily
>>>> reproducible by transformation. E.g. negative binomial can be reproduces by
>>>> a gamma-poisson mixture.
>>>>
>>>> On the other hand normal can be easily recreated from standard_normal.
>>>>
>>>
>>> I was mostly motivated by making it a bit easier to mechanically replace
>>> uses of randn(), which is probably even more common than normal() and
>>> standard_normal() in unit tests.
>>>
>>>
>>>> Would it be difficult to keep this list large, given that it should be
>>>> frozen, low maintenance code ?
>>>>
>>>
>>> I admit that I had in mind non-statistical unit tests. That is, tests
>>> that didn't depend on the precise distribution of the inputs.
>>>
>>
>> The problem is that the unit test in `stats` rely on precise inputs (up
>> to some numerical noise).
>> For example p-values themselves are uniformly distributed if the
>> hypothesis test works correctly. That mean if I don't have control over the
>> inputs, then my p-value could be anything in (0, 1). So either we need a
>> real dataset, save all the random numbers in a file or have a reproducible
>> set of random numbers.
>>
>> 95% of the unit tests that I write are for statistics. A large fraction
>> of them don't rely on the exact distribution, but do rely on a random
>> numbers that are "good enough".
>> For example, when writing unit test, then I get every once in a while or
>> sometimes more often a "bad" stream of random numbers, for which
>> convergence might fail or where the estimated numbers are far away from the
>> true numbers, so test tolerance would have to be very high.
>> If I pick one of the seeds that looks good, then I can have tighter unit
>> test tolerance to insure results are good in a nice case.
>>
>> The problem is that we cannot write robust unit tests for regression
>> tests without stable inputs.
>> E.g. I verified my results with a Monte Carlo with 5000 replications and
>> 1000 Poisson observations in each.
>> Results look close to expected and won't depend much on the exact stream
>> of random variables.
>> But the Monte Carlo for each variant of the test took about 40 seconds.
>> Doing this for all option combination and dataset specification takes too
>> long to be feasible in a unit test suite.
>> So I rely on numpy's stable random numbers and hard code the results for
>> a specific random sample in the regression unit tests.
>>
>> Josef
>>
>>
>>
>>>
>>> --
>>> Robert Kern
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/9a01dbca/attachment.html>