[SciPy-User] About wrong results from scipy statistical distributions

Tue May 14 16:13:40 EDT 2013

On Tue, May 14, 2013 at 3:34 PM, Sergio Rojas <sergio_r at mail.com> wrote:
> I am wondering whether there are specific examples one could run to check
> what exactly is
> wrong with skew and kurtosis on scipy as mentioned in the "remaining
> Issuess" [1] section
> of the documention
> [
> http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#remaining-issues
> ].
>
> It is also mentioned there that there is a range of values on which the
> scipy distributions gives wrong results. Is there any other document
> explaining further  this (and the previous) issue?
>
> Thanks in advance,
>
> Sergio
>
> [1]
>
> Remaining Issues
>
> skew and kurtosis, 3rd and 4th moments and entropy are not thoroughly tested
> and some coarse testing indicates that there are still some incorrect
> results left.

I think I left some tests in the test suite that are not run.
The problems is that these are coarse statistical tests that have
several false failures.
Also there are problems if the moments don't even exist.
https://github.com/scipy/scipy/issues/1329#issuecomment-17022751 is
the list when I looked at this the last time, IIRC

We should also be able to get the moments by numerical integration,
but I haven't tried that yet.

I think some of the entropies have possibly the wrong sign.

Essentially, it requires going through the list and checking all the
suspicious skew, kurtosis and entropy, to see whether they are a false
alarm or a bug.

related issues: non-existing moments are not correctly specified
searching the issues for skew and kurtosis finds these two
https://github.com/scipy/scipy/issues/2401
https://github.com/scipy/scipy/issues/1866

>
> the distributions have been tested over some range of parameters, however in
> some corner ranges, a few incorrect results may remain.

This is a generic warning.
Some cases are known and have tickets, like truncnorm if you work only
far out in the right tail.
There are some function where the pdf is singular (-> inf) at the
boundary (or maybe also interior), and the results can get strange
when we get close enough.
(Here is a case that Warren fixed https://github.com/scipy/scipy/pull/106 )
Some distributions degenerate to a single point at the limit of the
parameter space, and I don't know how close to the limits they start
to get "weird", i.e. numerical problems could dominate the result.

There can also be numerical precision problems in the scipy.special
function (but Pauli has been improving those a lot).

Contributions here would be very helpful.
Also if a case is identified as false alarm, it would be good to know
so it can be taken of the "suspicious" list

Josef

>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>