[SciPy-Dev] stats.nanstd interface
Bruce Southey
bsouthey at gmail.com
Mon Jun 21 10:15:36 EDT 2010
On 06/20/2010 08:58 PM, Pierre GM wrote:
> On Jun 20, 2010, at 9:44 PM, Skipper Seabold wrote:
>
>>> I really think that all these stats 'nan functions' probably could
>>> just be converted into masked arrays and using the appropriate masked
>>> array functions instead of creating separate functions. This would
>>> also address how to handle the 'out' argument.
>>>
>>>
>> Someone can correct me if I'm wrong, but I believe that there is a
>> performance hit for using masked arrays over the nan functions. Wes
>> and Keith have mentioned it wrt pandas and larry, if I recall.
>>
There is no performance hit here because you are comparing two totally
different things!
> Not a surprise at all: the nanfunctions make use of np.putmask which is quite efficient, while MaskedArrays have their extra baggage (in __array_finalize__) which tend to slow things down. However, the nanfunctions work only w/ float arrays, while the MaskedArrays function are more generic.
>
Furthermore, you have tremendous flexibility with masked arrays that you
can decide what you want is missing and even undo or modify the mask as
needed.
>>> m=np.ma.arange(10)
>>> m.mask=m.data>6
>>> m
masked_array(data = [0 1 2 3 4 5 6 -- -- --],
mask = [False False False False False False False True
True True],
fill_value = 999999)
>>> m.sum()
21
>>> m.mask=m.data>7
>>> m
masked_array(data = [0 1 2 3 4 5 6 7 -- --],
mask = [False False False False False False False False
True True],
fill_value = 999999)
>>> m.sum()
28
It is more a question of what want to do because while using NaN is
faster than using a masked array for some cases, you can slow down big
time if you have to recreate the the functionality you require. Also,
you have to know where and how the masked arrays are being used because
the masked array component may be a very small part of the overall problem.
Bruce
More information about the SciPy-Dev
mailing list