[SciPy-dev] Statistics toolbox and nans

Travis Oliphant oliphant at ee.byu.edu
Fri Nov 1 13:55:55 EST 2002


>
> On 1 Nov 2002, A.J. Rossini wrote:
>
> > >>>>> "travis" == Travis Oliphant <oliphant.travis at ieee.org> writes:
> >
> >     travis> Hello developers.
> >     travis> What should we do about nan's and the stats toolbox.  Stats is one
> >     travis> package where people may use nans to represent missing values.
> >
> > Yech.  This is a hard issue, but NAN isn't the solution.
>
> I think so too that using NANs for representing missing values cannot be
> reliable. There's too much weirdness going on with NaNs depending on the
> local C library. For example, on linux

Well, MATLAB is cross-platform and it uses NANs like this extensively.  So
I'm not sure I buy this argument.

>
> >>> nan=float('nan')
> >>> nan==nan
> 1
> >>> nan==1
> 1

So don't use nan's that way.  That's why we have isnan(x)  to test where
the nan's are in an array.  This function should work on the platforms
where scipy works.

I agree that equality testing of nans against another float should not be
used in an algorithm.

>
> while on Windows nan==1 returns 0, as I have been told. See
>
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=mailman.1035055286.17772.python-list%40python.org&rnum=6&prev=/groups%3Fq%3DPearu%2BPeterson%26hl%3Den%26lr%3D%26ie%3DUTF-8%26scoring%3Dd
>
> Tim Peters has been explained these NAN issues several times on the
> Usenet, google for 'Tim Peters NaN'.

Sure, but he hasn't gone into enough detail.  Matlab successfully does it
so obviously it can be done (especially on modern machines that use
IEEE754)

>
> Since "all IEEE-754 behavior visible from Python is a platform-dependent
> accident" [T.P.], I don't see that NaNs could be used in SciPy for
> anything useful in an platform independent way.
> I would avoid using NaNs
> and Infs as much as possible until they become less platform-dependent,
> may be by implementing special objects for Python instead of using
> float('nan'), float('inf') (that even should not work on Win32).

Right now, to me this is a straw man (a hypothetical argument).

We already are supporting nan's in scipy.  See what scipy_base.nan or
scipy_base.inf  gives you on your platform.

I would prefer specific examples that show where whay scipy is doing now
is not working on specific platforms that we want to support, then
general arguments that refer to T.P.'s apparent distaste of nan's.   We
have already borrowed heavily from the ideas T.P. espoused.  Look deeper
into scipy_base to see what I'm talking about.

In short, I don't agree with the statements that nans don't or can't work.

Now, I agree that treating missing values using NaNs is somewhat of a
kludge.  And there are perhaps better ways to handle it.  It is a rather
efficient kludge that works much of the time.

Even if you don't officially bless nan's as "missing values,"  If they
every show up in your calculation, they essentially are missing values and
the question still remains as to how to deal with them (should you ignore
them or let them ruin the rest of your calculation?)

-Travis




More information about the SciPy-Dev mailing list