[SciPy-User] peer review of scientific software

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Jun 6 10:44:20 EDT 2013


On Thu, Jun 6, 2013 at 8:59 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Thu, Jun 6, 2013 at 1:56 PM,  <josef.pktd at gmail.com> wrote:
>>>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress.
>>>>> The first took me ages to be spotted as I was assuming the error was on
>>>>> my side as scipy was seen as a "large library widely used".
>>
>> Ok, I found the stats.linregress case
>> https://github.com/scipy/scipy/pull/433
>>
>> There is no way I write unit tests for all edge cases that I never
>> expect to show up.
>> For sure you find bugs/behavior like this in many packages, and I
>> wouldn't trust any package for extreme cases, no matter what their
>> test suite is.
>
> I guess that means the user has to know what you thought an extreme case was?

Anything that gets close to machine precision in a special case
requires special attention.

I assume many scipy.special distribution functions where written with
statistical tests in mind, with maybe good accuracy in the 0.0001 to
0.5 percentiles. I wouldn't trust any of them for extreme tails 1e-30
until I have verified them. And I know in which cases Pauli and others
expanded the range with good precision.

https://github.com/scipy/scipy/issues/1489
fixed by https://github.com/scipy/scipy/pull/2494
but never went high on *my* priorities

>
> I think the point of test driven development is precisely in order to
> specify the edges before you've locked yourself down to an
> implementation.  If one write's the implementation first one often
> does forget the edges.

"A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete
fools." DA

It's a question of priorities, I don't spend my time coming up with
edge cases where something might fail, and then still only cover 50%
of things users might run into. Some edge cases are important, some
are just a numerical curiosity.

example: minimum sample size for time series analysis in statsmodels
is not checked
http://groups.google.com/group/pystatsmodels/browse_thread/thread/15bba79f7474e1b3
I have an open issue for it, but I have no idea why someone would do
time series analysis with 5 observations. It doesn't worry me enough
to drop everything and fix the "bug".

skew and kurtosis tests in scipy.stats now enforce the correct minimum
sample size.

example almost perfect collinearity in estimating a linear regression:
the model produces nonsense, but what a statistical package is doing
in this case and how close to perfect collinearity it can get without
breaking down varies widely.

my priorities are usually: check that something is correct for 99.5%
of use cases and worry about the other 0.5% when they actually show
up.
And sometimes we have to revise our evaluation, when an edge case that
we never thought off actually occurs pretty regularly.
(if you want an example: problems with perfect prediction in Logit
that neither Skipper nor I knew about until someone ran into it.)
http://statsmodels.sourceforge.net/stable/pitfalls.html#unidentified-parameters


to come back to the original point:
I think edge cases are an area where having a large user base, that
does implicit functional testing, is an advantage, and where I would
trust packages that are popular more than those that have a larger
test suite (when that's not the same).

Josef
<making up percentages>

>
> Cheers,
>
> Matthew
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list