[SciPy-Dev] stats.nanstd interface

Mon Jun 21 10:15:36 EDT 2010

On 06/20/2010 08:58 PM, Pierre GM wrote:
> On Jun 20, 2010, at 9:44 PM, Skipper Seabold wrote:
>    
>>> I really think that all these stats 'nan functions' probably could
>>> just be converted into masked arrays and using the appropriate masked
>>> array functions instead of creating separate functions. This would
>>> also address how to handle the 'out' argument.
>>>
>>>        
>> Someone can correct me if I'm wrong, but I believe that there is a
>> performance hit for using masked arrays over the nan functions.  Wes
>> and Keith have mentioned it wrt pandas and larry, if I recall.
>>      
There is no performance hit here because you are comparing two totally 
different things!

> Not a surprise at all: the nanfunctions make use of np.putmask which is quite efficient, while MaskedArrays have their extra baggage  (in __array_finalize__) which tend to slow things down. However, the nanfunctions work only w/ float arrays, while the MaskedArrays function are more generic.
>    
Furthermore, you have tremendous flexibility with masked arrays that you 
can decide what you want is missing and even undo or modify the mask as 
needed.

 >>> m=np.ma.arange(10)
 >>> m.mask=m.data>6
 >>> m
masked_array(data = [0 1 2 3 4 5 6 -- -- --],
              mask = [False False False False False False False  True  
True  True],
        fill_value = 999999)
 >>> m.sum()
21
 >>> m.mask=m.data>7
 >>> m
masked_array(data = [0 1 2 3 4 5 6 7 -- --],
              mask = [False False False False False False False False  
True  True],
        fill_value = 999999)
 >>> m.sum()
28

It is more a question of what want to do because while using NaN is 
faster than using a masked array for some cases, you can slow down big 
time if you have to recreate the the functionality you require.  Also, 
you have to know where and how the masked arrays are being used because 
the masked array component may be a very small part of the overall problem.

Bruce