[Numpy-discussion] Medians that ignore values

Anne Archibald peridot.faceted at gmail.com
Sat Sep 20 02:51:26 EDT 2008


2008/9/19 Eric Firing <efiring at hawaii.edu>:
> Pierre GM wrote:
>
>>> It seems to me that there are pragmatic reasons
>>> why people work with NaNs for missing values,
>>> that perhaps shd not be dismissed so quickly.
>>> But maybe I am overlooking a simple solution.
>>
>> nansomething solutions tend to be considerably faster, that might be one
>> reason. A lack of visibility of numpy.ma could be a second. In any case, I
>> can't but agree with other posters: a NaN in an array usually means something
>> went astray.
>
> Additional reasons for using nans:
>
> 1) years of experience with Matlab, in which using nan for missing
> values is the standard idiom.

Users are already retraining to use zero-based indexing; I don't think
asking them to use a full-featured masked array package is an
unreasonable retraining burden, particularly since this idiom breaks
as soon as they want to work with arrays of integers or records.

> 2) convenient interfacing with extension code in C or C++.
>
> The latter is a factor in the present use of nan in matplotlib; using
> nan for missing values in an array passed into extension code saves
> having to pass and process a second (mask) array.  It is fast and simple.

How hard is it to pass an array where the masked values have been
filled with nans? It's certainly easy to go the other way (mask all
nans). I think this is less painful than supporting two
differently-featured sets of functions for dealing with arrays
containing some invalid values.

Anne



More information about the NumPy-Discussion mailing list