[issue39218] Assertion failure when calling statistics.variance() on a float32 Numpy array

Mark Dickinson report at bugs.python.org
Thu Aug 26 04:21:11 EDT 2021


Mark Dickinson <dickinsm at gmail.com> added the comment:

> The rounding correction in _ss() looks mathematically incorrect to me [...]

I don't think it was intended as a rounding correction - I think it's just computing the variance (prior to the division by n or n-1) of the `(x - c)` terms using the standard "expectation of x^2 - (expectation of x)^2" formula:

  sum((x - c)**2 for x in data) - (sum(x - c for x in data)**2) / n

So I guess it *can* be thought of as a rounding correction, but what it's correcting for is an inaccurate value of "c"; it's not correcting for inaccuracies in the subtraction results. That is, if you were to add an artificial error into c at some point before computing "total" and "total2", that correction term should take you back to something approaching the true sum of squares of deviations.

So mathematically, I think it's correct, but not useful, because mathematically "total2" will be zero. Numerically, it's probably not helpful.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39218>
_______________________________________


More information about the Python-bugs-list mailing list