[Numpy-discussion] var and std
Aarre Laakso
aarre at pair.com
Tue Nov 28 07:42:05 EST 2006
Hello,
I was wondering if someone could explain the rationale for changing
.var() and .std() in release 1.0b1 from normalizing by n-1 (unbiased
estimate from sample) to normalizing by n (population)?
I have found the note that this change happened in the Release Notes
http://www.scipy.org/ReleaseNotes/NumPy_1.0
and the change itself in Changeset 2560
http://projects.scipy.org/scipy/numpy/changeset/2560
as well as a related documentation change in Ticket 388
http://projects.scipy.org/scipy/numpy/ticket/388
but I have not been able to find a description of why the change was
made, despite searching the website, the Trac and the mailing list.
I am aware of the argument that, if the difference between n and n-1
matters to you, then you are "up to no good". On the other hand, this
change breaks a lot of my unit tests. It also seems to violate the
principle of least surprise: every other numerical environment that I
have used divides by n-1 by default. Examples include MATLAB:
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/index.html?/access/helpdesk/help/techdoc/ref/std.html&
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/index.html?/access/helpdesk/help/techdoc/ref/var.html&
Octave:
http://www.gnu.org/software/octave/doc/interpreter/Basic-Statistical-Functions.html
and R:
http://finzi.psych.upenn.edu/R/library/stats/html/cor.html
http://finzi.psych.upenn.edu/R/library/stats/html/sd.html
It also seems to present an inconsistent interface: cov() still
normalizes by n-1 instead of n. It also has a 'bias' parameter that
allows normalizing by n, which is similar to the compromises provided in
the other numerical packages listed above. As an aside, cov() also does
not seem to be provided as a method, only as a function.
In light of all that, I am sure there must have been a good reason for
the change, and I am very curious what it was. Thanks for any insight
you can offer.
Regards,
Aarre
--
Aarre Laakso
http://www.laakshmi.com/aarre/
More information about the NumPy-Discussion
mailing list