[Numpy-discussion] Warnings in numpy.ma.test()

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Mar 18 20:32:38 EDT 2010


On Thu, Mar 18, 2010 at 7:26 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> josef.pktd at gmail.com wrote:
>>> I'm facing this at the moment: not a big deal, but I'm using histogram2d
>>> on some data. I just realized that it may have some NaNs in it, and I
>>> have no idea how those are being handled.
>
>> histogram2d handles neither masked arrays nor arrays with nans
>> correctly,
>
> I really wasn't asking for help (yet) .. but thanks!
>
>>>>> x2=x[:,np.isfinite(x).all(0)]
>>>>> np.histogram2d(x2[0],x2[1],bins=3)
>> (array([[ 0.,  0.,  1.],
>>        [ 0.,  0.,  0.],
>>        [ 1.,  0.,  0.]]), array([ 1.        ,  1.66666667,
>> 2.33333333,  3.        ]), array([ 1.        ,  1.33333333,
>> 1.66666667,  2.        ]))
>
> I'll probably do something like that for now. I guess the question is --
> should this be built in to histogram2d (and other similar functions)?

I think yes, for all functions that are closer to actual data and
where there is an obvious way to handle the missing values. But, it's
work and adds a lot of code to a nice simple function. And if it's
just one extra line for the user, than it is not too high on my
priority.

For example, I rewrote stats.zscore a while ago to handle also
matrices and masked arrays, and Bruce rewrote geometric mean and
others, but these are easy cases, for many of the other functions it's
more work.

Also. if a function gets too much overhead, I end up rewriting and
inlining the core of the function over and over again when I need it
inside a loop, for example for optimization, or I keep a copy of the
function around that doesn't use the overhead.

I actually do little profiling, so I don't really know what the cost
would be in a loop.

Josef


> -Chris
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list