[Numpy-discussion] Design feedback solicitation

Thu Jul 14 22:14:53 EDT 2016

On Fri, Jul 15, 2016 at 2:53 AM, Pavlyk, Oleksandr <
oleksandr.pavlyk at intel.com> wrote:
>
> Hi Robert,
>
> Thank you for the pointers.
>
> I think numpy.random should have a mechanism to choose between methods
for generating the underlying randomness dynamically, at a run-time, as
well as an extensible framework, where developers could add more methods.
The default would be MT19937 for backwards compatibility. It is important
to be able to do this at a run-time, as it would allow one to use different
algorithms in different threads (like different members of the parallel
Mersenne twister family of generators, see MT2203).
>
> The framework should allow to define randomness as a bit stream, a stream
of fixed size integers, or a stream of uniform reals (32 or 64 bits). This
is a lot of like MKL’s abstract method for basic pseudo-random number
generation.
>
> Each method should provide routines to sample from uniform distributions
over reals (in floats and doubles), as well as over integers.
>
> All remaining non-uniform distributions build on top of these uniform
streams.

ng-numpy-randomstate does all of these.

> I think it is pretty important to refactor numpy.random to allow the
underlying generators to produce a given number of independent variates at
a time. There could be convenience wrapper functions to allow to get one
variate for backwards compatibility, but this change in design would allow
for better efficiency, as sampling a vector of random variates at once is
often faster than repeated sampling of one at a time due to set-up cost,
vectorization, etc.

The underlying C implementation is an implementation detail, so the
refactoring that you suggest has no backwards compatibility constraints.

> Finally, methods to sample particular distribution should uniformly
support method keyword argument. Because method names vary from
distribution to distribution, it should ideally be programmatically
discoverable which methods are supported for a given distribution. For
instance, the standard normal distribution could support
method=’Inversion’, method=’Box-Muller’, method=’Ziggurat’,
method=’Box-Muller-Marsaglia’ (the one used in numpy.random right now), as
well as bunch of non-named methods based on transformed rejection method
(see http://statistik.wu-wien.ac.at/anuran/ )

That is one of the items under discussion. I personally prefer that one
simply exposes named methods for each different scheme (e.g.
ziggurat_normal(), etc.).

> It would also be good if one could dynamically register a new method to
sample from a non-uniform distribution. This would allow, for instance, to
automatically add methods to sample certain non-uniform distribution by
directly calling into MKL (or other library), when available, instead of
building them from uniforms (which may remain a fall-through method).
>
> The linked project is a good start, but the choice of the underlying
algorithm needs to be made at a run-time,

That's what happens. You instantiate the RandomState class that you want.

> as far as I understood, and the only provided interface to query random
variates is one at a time, just like it is currently the case
> in numpy.random.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160715/a6003f75/attachment.html>