[SciPy-dev] the state of scipy unit tests

Mon Nov 24 19:11:42 EST 2008

>
> I don't think we should be doing any K-S tests of the distributions in
> the test suite. Once we have validated that our algorithms work (using
> these tests, with large sample sizes), we should generate a small
> number of variates from each distribution using a fixed seed. The unit
> tests in the main test suite will simply generate the same number of
> variates with the same seed and directly compare the results. If we
> start to get failures, then we can recheck using the K-S tests that
> the algorithm is still good, and regenerate the reference variates.
>
> The only problem I can see is if there are platform-dependent results
> for some distributions, but that would be very good to figure out now,
> too.
>
> --
> Robert Kern
>

Currently I am using generated random variables for two purposes:

* To test whether the random number generator is correct, kstest or
something similar would be necessary, with a large enough sample size
for the tests to have reasonable power, similar to the initial kstest
in the test suite.
  (btw. there are still 2 known failures in mtrand)

* In the second type of tests, I use the sample properties as a
benchmark for the theoretical properties. For this purpose any
randomness could be completely removed.
  Currently the only outside information the tests use comes from
numpy.random. e.g. I compare sample moments with theoretical  moments.
If we have a benchmark for what the true theoretical values should be,
then these could be directly compared, without generating a random
sample. However, I wasn't willing to go to R and generate benchmark
data for 100 or so distributions, so I used the sample properties.
Using sample properties and internal consistency between specific and
generic methods, creates, I think, quite reliable tests.
For this case, we could create now our own benchmark, assuming our
algorithms are correct, and use those for regression tests. A simple
script should be able to create the benchmark data.

One disadvantage of this is that, if we want to test a distribution
with different parameter values, we still need to get the benchmark
data for the new parameters.
When I made changes to, for example, to the behavior of a distribution
method at an extreme or close to corner value, I was quite glad I
could rely on my tests. I just needed to add a test case with new
parameters and the tests checked all methods for this case, without me
having to specify expected results for each method.

I don't know how everyone is handling this, but I need to keep track
of a public test suite (for those not working on distributions) and a
"development" test suite, which is much stricter, and that I use when
I make changes directly to the distribution module.

But, I agree, for the purpose of a regression test suite, there is a
large amount of simplification that can be done to my (bug-hunting)
test suite.

Josef