[Numpy-discussion] Medians that ignore values

Thu Sep 18 07:27:49 EDT 2008

I have data from biological experiments that is represented as a list of 
about 5000 triples. I would like to convert this to a list of the median 
of each triple. I did some profiling and found that numpy was much about 
12 times faster for this application than using regular Python lists and 
a list median implementation. I'll be performing quite a few 
mathematical operations on these values, so using numpy arrays seems 
sensible.

The only problem is that my data has gaps in it - where an experiment 
failed, a "triple" will not have three values. Some will have 2, 1 or 
even no values. To keep the arrays regular so that they can be used by 
numpy, is there some dummy value I can use to fill these gaps that will 
be ignored by the median routine?

I tried NaN for this, but as far as median is concerned, it counts as 
infinity:

 >>> from numpy import *
 >>> median(array([1,3,nan]))
3.0
 >>> median(array([1,nan,nan]))
nan

Is this the correct behavior for median with nan? Is there a fix for 
this or am I going to have to settle with using lists?

Thanks,

Peter