[SciPy-Dev] proper way to test distributions

Vincent Davis vincent at vincentdavis.net
Mon Jun 14 23:31:11 EDT 2010


On Mon, Jun 14, 2010 at 9:26 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Jun 14, 2010 at 22:07, Vincent Davis <vincent at vincentdavis.net> wrote:
>> I was reviewing the how tests of distribution where done in scipy with
>> the thought of applying the same methods to numpy.random. I have a lot
>> to learn here and appreciate you suggestions.
>>
>> Link to the scipy test
>> http://github.com/pv/scipy-work/blob/master/scipy/stats/tests/test_continuous_basic.py
>>
>> If I understand correctly the tests create a sample of 2000 from a
>> given distribution and the compares stats (mean, var...) calculate
>> with functions from numpy with those stored in the distribution
>> instant .stats  I am not sure how the mean is calculated within the
>> distribution (is it just using the scipy mean)  Anyway this seems a
>> little circular.
>>
>> Maybe I am missing something but here are my thought.
>>
>> 1) Using seed() and the comparing the actual results (arrays) helps to
>> make sure the code is stable but tells you nothing about the quality
>> of the distribution.
>>
>> 2) Using seed() and the calculating the moments (with numpy and
>> dist.stats) is not really any different that (1)
>>
>> 3) drawing a large sample (possibly using seed()) and calculating the
>> moments and comparing the to the theoretical moments seems like the
>> best option. But this could be slow.
>>
>> What is the best way?
>> What is desired in numpy?
>
> While it's worthwhile to have both, you really only want (1) in the
> standard unit test suite. (3) is good for working out the bugs in the
> initial implementation (or retroactively doing so after the grad
> student who wrote the initial implementation suddenly ran off and got
> a real job. <ahem>). You can provide them, if you wish to do that
> verification, but it doesn't need to be in the main test suite. (1)
> provides the first layer of protection. If we make an unintentional
> change to the results, (1) will catch it. If we make an intentional
> change, we can use (3) to verify that our changes are good. But we
> don't need to write (3) until we are actually faced with that task.
>
>> And a little off topic but isn't numpy.random duplicating scipy or
>> scipy duplicating numpy?
>
> Not really. scipy is using those routines from numpy for most of the
> duplicated distributions. numpy needed that functionality to match
> Numeric's. Of course, this means that scipy's (3)-type tests should be
> providing us coverage for many of numpy's distributions.
>

Thanks for the feedback, makes sense to me.

Vincent

> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list