[SciPy-Dev] SciPy-Dev Digest, Vol 193, Issue 12

Fri Nov 15 23:16:21 EST 2019

On Fri, Nov 15, 2019 at 10:55 PM Warren Weckesser <
warren.weckesser at gmail.com> wrote:

> On 11/15/19, Robert Kern <robert.kern at gmail.com> wrote:
> > On Fri, Nov 15, 2019 at 10:03 PM <rlucas7 at vt.edu> wrote:
> >
> >> > On Fri, Nov 15, 2019 at 4:01 PM Andrew Reed <reed at cs.unc.edu> wrote
> >> >
> >> >> I imagine that implementing a new method for rvs will probability
> >> >> break
> >> >> the repeatability of previous versions of SciPy, and I'm not sure if
> >> this
> >> >> is a distribution that warrants optimization.
> >> >>
> >> >
> >> > I don't think we've ever made such guarantees for the `rvs()`
> >> distribution
> >> > methods. In any case, moving from the inefficient default inversion to
> >> > a
> >> > reasonably efficient sampling algorithm would win the argument over
> >> > stability.
> >>
> >> Not sure the implementation and probably would still want to “check”
> some
> >> extreme parameter cases to make sure things don’t break down because of
> >> numerical over/underflow-at least not more than current.
> >>
> >> I think the rvs() method unit tests have an assumption of A single
> >> uniform
> >> draw for each sampled value in the unit tests. If you are using
> something
> >> like the difference of 2 exponential random variables with the same
> >> rate/scale you’ll need to turn off the tests IIRC. Last I checked there
> >> are
> >> some existing examples of that in the tests.
> >>
> >
> > I'm not sure what you are referring to. Can you show me an example of
> that?
> > That's certainly not a restriction the rvs() methods obey, so a unit test
> > expecting that is incorrect and should be fixed.
>
>
> I think Lucas is referring to this test of broadcasting in the rvs()
> method:
>
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L253
>
> This test was added when I worked on ensuring that all the rvs()
> methods correctly broadcast their arguments.  For many distributions,
> this test was a simple way to verify that broadcasting was working
> correctly.  The comment in the test explains the context.  For
> distributions that use just one random value from the underlying numpy
> random generator per call to rvs(), the test checks that a call using
> broadcasting returns the same result as calling the result of
> vectorizing rvs() using numpy.vectorize.  As explained in the comment,
> distributions for which that assumption is not true must be included
> in the list that defines the boolean variable 'shape_only' in the
> test. (For these, just the shape of the result is checked, not the
> actual values.)  There is no *general* assumption that rvs() must use
> just one random variate from the numpy generator, and (as explained in
> the comment) there is no expectation that an rvs() method that uses
> one random variate now can't be changed to use more later.  Just
> update the test if such a change is made.
>

Ah, I see. I think the wording is a little confusing, since many of the
RandomState distribution methods themselves draw more than one random value
from the underlying PRNG to compute their result. This confusion may be
limited to myself, though, since I wrote most of them and still think about
them at that level. The restriction here seems more accurately (if less
felicitously) phrased as "makes more than one RandomState method call".
Because then their interleaving of PRNG draws won't match the interleaving
that vectorize would do.

I still don't really like that part of the test. I think that I would
prefer that all distributions would be treated as "shape_only". Did it help
us solve a bug once?

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/cc5924ac/attachment.html>