[Numpy-discussion] comparing arrays with NaN in them.

Christopher Barker Chris.Barker at noaa.gov
Fri Aug 24 12:08:05 EDT 2007


Matthieu Brucher wrote:
> 2007/8/24, mark <markbak at gmail.com <mailto:markbak at gmail.com>>:
>     There may be multiple nan-s, but what Chris did is simply create one
>     with the same nan's
> 
>      >>> a = N.array((1,2,3,N.nan))
>      >>> b = N.array((1,2,3,N.nan))
> 
>     I think these should be the same.

I'm the OP, but It depends what you mean by "the same". Yes, these two 
arrays are the same, and that's what I want to test for in this case. 
However, in the mathematical sense, I do understand what NaN == NaN 
should be false -- if you're doing math, those NaN's could have been 
arrived at by very different calculations, so you really wouldn't want 
them to compare equal, so the IEEE standard that NaN does not compare 
equal to anything makes sense to me.

However, what I'm doing is testing to make sure I got the result I 
expected, so I want to know if two arrays are the same, including NaN's 
in the same places. If I wasn't working with an array package, I guess 
I'd be testing for NaN specifically where I expect it, so the solution I 
came up with before makes the most sense:

N.alltrue(a[~N.isnan(a)] == b[~N.isnan(b)])

However, it's not likely, but that could give a true result if the NaN's 
were in different places, but there were the same number and everything 
happened to work out right. So maybe there is a need for a:

nanequal, to go with:

nanargmax
nanargmin
nanmax
nanmin
nansum

> You can have several  different NaN, 

You can? I thought NaN was defined by IEEE 754 as a particular bit 
pattern (one for each precision, anyway).

Warren Focke wrote:
> Maybe something with masked arrays?

In this case, I'm using NaN to mean: "no valid data", so masked arrays 
are probably a better solution anyway. However, I like the simplicity of 
storing a non-value in the same binary array.

However, if I do go with masked arrays:

What's the status of the two masked array implementations? Which should 
I use? Unless there are huge feature differences (which I don't think 
there are), then I want to use the one that's going to get maintained 
into the future -- do we know yet which that will be?

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list