[scikit-learn] NEP: Random Number Generator Policy

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Jun 16 20:42:12 EDT 2018


On Sat, Jun 16, 2018 at 8:29 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On 6/16/18 05:54, josef.pktd at gmail.com wrote:
>>
>> On Sat, Jun 16, 2018 at 3:59 AM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>>
>>> I have made a significant revision. In this version, downstream projects
>>> like scikit-learn should experience significantly less forced churn.
>>>
>>>
>>> https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst
>>>
>>> https://mail.python.org/pipermail/numpy-discussion/2018-June/078252.html
>>>
>>> tl;dr RandomState lives! But its distributions are forever frozen. So
>>> maybe
>>> "undead" is more apt. Anyways, RandomState will continue to provide the
>>> same
>>> stream-compatibility that it always has. But it will be internally
>>> refactored to use the same core uniform PRNG objects that the new
>>> RandomGenerator distributions class will use underneath (defaulting to
>>> the
>>> current Mersenne Twister, of course). The distribution methods on
>>> RandomGenerator will be allowed to evolve with numpy versions and get
>>> better/faster implementations.
>>>
>>> Your code can mix the usage of RandomState and RandomGenerator as needed,
>>> but they can be made to share the same underlying RNG algorithm's state.
>>
>>
>>
>> Sounds good to me, and I think handles all our concerns.
>>
>> I also think that the issues behind the np.random.* section about the
>> global state and seed can be revisited for possible deprecation of
>> convenience features.
>>
>> One clarifying question, mainly to see IIUC
>>
>> in this quote
>> """
>> Calling numpy.random.seed() thereafter SHOULD just pass the given seed
>> to the current basic RNG object and not attempt to reset the basic RNG
>> to the Mersenne Twister. The global RandomState instance MUST be
>> accessible by the name numpy.random.mtrand._rand
>> """
>>
>> "the current basic RNG object" refers to the global object. AFAIU, it
>> is possible to change it numpy.random.mtrand._rand. Is it?
>
>
> numpy.random.mtrand._rand would not be a basic RNG object; it would be (as
> it is now) a RandomState instance. "the current basic RNG object" would be
> the basic RNG that that global RandomState instance is currently using.
>
> It is not possible (now or in the glorious NEP future) to assign a new
> instance to numpy.random.mtrand._rand. All of the numpy.random.* functions
> are actually just simple aliases to the methods on that object when the
> module is first built. Re-assigning _rand wouldn't reassign those aliases.
> numpy.random.standard_normal(), for instance, would still be the
> .standard_normal() method on the RandomState instance that _rand initially
> pointed to.
>
> Currently and under the NEP, the only way to modify
> numpy.random.mtrand._rand is to call its methods (i.e. the numpy.random.*
> convenience functions) to modify its internal state. That's not changing.
>
> The only thing that will change will be that there will be a new
> numpy.random.* function to call that will let you give the global
> RandomState a new basic RNG object that it will swap in internally. Let's
> call it np.random.swap_global_basic_rng(). If you don't use that function,
> you won't have a problem. I intend to make this new function *very* explicit
> about what it is doing, and document the crap out of it so it won't be
> misused like np.random.seed() is.

I didn't catch that part. Now it's clear.

>
>> I never tried that so I didn't know we can change the global
>> RandomState, and thought we will have to replace np.random.seed usage >
>> with a specific RandomState(seed) instance
>
>
> I did a quick review of np.random.seed() usage in statsmodels, and I think
> you are mostly fine. It looks like you mostly use it in unit tests and at
> the top of examples. The only possible problem that I can see that you might
> have with the swap_global_basic_rng() is if some other package that you rely
> on calls it in its library code. Then subsequent statsmodels unit tests
> might fail because when they call np.random.seed(), it won't be reseeding a
> Mersenne Twister but another basic RNG.
>
> However, I intend to make that a weird and unnatural thing to do. It's
> already unlikely to happen as it's a niche requirement that one mostly would
> need at the start of a whole program, not buried down inside library code.
> But we will also document that function to discourage such usage, and
> probably have unconditional noisy warnings that users would have to
> explicitly silence.
>
> If one of your dependencies did that, you'd be well within your rights to
> tell them that they are misusing numpy and causing breakage in statsmodels.
>
>> In loose analogy:
>>
>> Matplotlib has a "global" current figure and axis, gca, gcf.
>> In statsmodels we avoid any access to and usage of it and only work
>> with individual figure/axis instances that can be provided by the
>> user. (except for maybe some documentation examples and maybe some
>> "legacy" code.)
>> (
>> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/graphics/utils.py#L48
>> )
>>
>> AFAICS, essentially, statsmodels will need a similar policy for
>> RandomState/RandomGenerator and give up the usage of the global random
>> instance.
>
>
> I mean, you certainly *should* (outside of unit tests) for very similar
> reasons why you avoid the global state in matplotlib, but this NEP won't
> force you to. You should do so anyways under the status quo, too. For any of
> your functions that call np.random.* functions internally, it's hard to use
> them in threaded applications, for instance, because it is relying on that
> global state.
>
> scikit-learn's check_random_state() is a good pattern to follow.

Thanks for the clarification.

I just realized that I had replied to scikit-learn mailing list.
I had thought this was numpy-discussion.

sorry about that.

Josef


>
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>  that is made terrible by our own mad attempt to interpret it as though it
> had
>  an underlying truth."
>   -- Umberto Eco
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn


More information about the scikit-learn mailing list