Probabilistic unit tests?

Sat Jan 12 13:08:33 EST 2013

On 12/01/13 08:07, alex23 wrote:
> On 11 Jan, 13:34, Steven D'Aprano <steve
> +comp.lang.pyt... at pearwood.info> wrote:
>> Well, that's not really a task for unit testing. Unit tests, like most
>> tests, are well suited to deterministic tests, but not really to
>> probabilistic testing. As far as I know, there aren't really any good
>> frameworks for probabilistic testing, so you're stuck with inventing your
>> own. (Possibly on top of unittest.)
>
> One approach I've had success with is providing a seed to the RNG, so
> that the random results are deterministic.
>

My ex-boss once instructed to do the same thing to test functions for 
generating random variates. I used a statistical approach instead.

There are often several ways of generating data that follow a particular 
distribution. If you use a given seed so that you get a deterministic 
sequence of uniform random variates you will get deterministic outputs 
for a specific implementation. But if you change the implementation the 
tests are likely to fail. e.g. To generate a negative exponential 
variate -ln(U)/lambda or -ln(1-U)/lambda will do the job correctly, but 
tests for one implementation would fail with the other. So each time you 
changed the implementation you'd need to change the tests.

I think my boss had in mind that I would write the code, seed the RNG, 
call the function a few times, then use the generated values in the 
test. That would not even have tested the original implementation. I 
would have had a test that would only have tested whether the 
implementation had changed. I would argue, worse than no test at all. If 
I'd gone to the trouble of manually calculating the expected outputs so 
that I got valid tests for the original implementation, then I would 
have had a test that would effectively just serve as a reminder to go 
through the whole manual calculation process again for any changed 
implementation.

A reasonably general statistical approach is possible. Any hypothesis 
about generated data that lends itself to statistical testing can be 
used to generate a sequence of p-values (one for each set of generated 
values) that can be checked (statistically) for uniformity. This 
effectively tests the distribution of the test statistic, so is better 
than simply testing whether tests on generated data pass, say, 95% of 
the time (for a chosen 5% Type I error rate). Cheers.

Duncan