[issue21046] Document formulas used in statistics

Steven D'Aprano report at bugs.python.org
Fri May 16 12:17:09 CEST 2014


Steven D'Aprano added the comment:

On Fri, May 16, 2014 at 07:50:16AM +0000, Ezio Melotti wrote:

> Do you want to propose a patch?

I'm really not sure that I agree with this request. I'm currently 
sitting on the fence, undecided, about 60% against and 40% in favour of 
explicitly documenting the formulae. This is not Mathworld or Wikipedia, 
and it is easy to google for "variance" to find out what it means.

This request orginally came from somebody who claimed he didn't know 
what the functions were from the names (mean, median, variance) but 
would recognise them from the formulae. Given how hard it is to 
accurately portray mathematical formulae in plain text, and how many 
different versions of the mathematical formulae there are, I don't think 
that will apply to very many people.

There's no good way to write mathematical functions *accurately* in 
ASCII text. I can write mean(L) = sum(L)/len(L), for example, that's 
quite trivial. But it's not the usual mathematical formula. If the OP 
doesn't recognise the name "mean", will he recognise that non-standard 
formula? Should the docs include μ = ∑x÷n? But even that's not quite 
accurate -- where's the subscript on the x? The reader needs to 
understand the formula, and they aren't going to get that here. They 
probably have to go read Mathworld or Wikipedia regardless.

The problem is compounded with variance. Which of these should we write?

    σ² = ∑(x - μ)² ÷ n
    s² = ∑x² ÷ n - μ²
    s[n]² = ∑(x - a)² ÷ n
    Var(X) = E[X-μ)²]
    Var(X) = E[X²] - (E[X])²

or something else?

What do other statistics packages do? I wouldn't want to do *less* -- if 
it is common for other stats packages to show the formula, then I would 
agree we should do the same. R doesn't seem to do so:

http://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21046>
_______________________________________


More information about the Python-bugs-list mailing list