In numpy, why is it ok to do matrix.mean(), but not ok to do matrix.median()?

Thomas Jollans tjol at tjol.eu
Tue May 1 15:38:31 EDT 2018


On 01/05/18 19:57, C W wrote:
> matrix.median()        # throws error message

READ error messages. At the very least, quote error messages when asking
questions somewhere like here. There I was, wondering why the numpy docs
didn't mention ndarray.median when you were clearly using it...

Anyway,

> Hello everyone,
> 
> In numpy, why is it ok to do matrix.mean(), but not ok to do
> matrix.median()? To me, they are two of many summary statistics. So, why
> median() is different?

First, this is how it's different: the method ndarray.median simply does
not exist.

Now, of course, most numpy functions don't have method versions, so
there's nothing special about median here. However, as you quite rightly
point out, median would be "a good fit".

As with most things that are "a bit odd" about numpy this probably boils
to "historical reasons".

numpy mostly goes back to an earlier package called "Numeric". Numeric's
array did not have methods like mean(). Early (anno 2005) numpy also
incorporated features from a package called "numarray"; one of these
features were array methods like ".mean". numarray did not have a
.median method, though it *did* have a function numarray.mlab.median. So
far, so good. ndarray.mean *must* exist for compatibility reasons,
ndarray.median need not.

So why not add it later? Other methods have been added, after all. Well,
for starters, "who cares?". Most of numpy is functions; the methods are
nice, but not that important. The other obstacle is that numpy.median is
implemented in Python, not in C. For historical reasons™, it has to stay
that way. They tried to change it in 2014, but that broke some other
packages...


> 
> Here's an example code,
> 
> import numpy as np
> matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
> 
> # find the mean
> np.mean(matrix)      # ok
> matrix.mean()          # ok
> 
> # find the median
> np.median(matrixA)  # ok

> 
> 
> Also, why have two of the same thing: np.mean(matrix) and matrix.mean()?
> When to use which one?
> 
> The documentation below looks almost identical! What am I missing here?
> [1] Median documentation:
> https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.median.html
> [2] Mean documentation:
> https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.mean.html
> 
> 
> Thank you so much,
> 
> M
> 




More information about the Python-list mailing list