[SciPy-dev] Statistics toolbox and nans

A.J. Rossini rossini at blindglobe.net
Fri Nov 1 16:01:30 EST 2002


>>>>> "travis" == Travis Oliphant <oliphant at ee.byu.edu> writes:

    travis> Right now, to me this is a straw man (a hypothetical argument).

I agree (i.e. NAN being the problem; it's not -- I'd probably complain
about any value that could cause confusion).

    travis> Now, I agree that treating missing values using NaNs is somewhat of a
    travis> kludge.  And there are perhaps better ways to handle it.  It is a rather
    travis> efficient kludge that works much of the time.

    travis> Even if you don't officially bless nan's as "missing values,"  If they
    travis> every show up in your calculation, they essentially are missing values and
    travis> the question still remains as to how to deal with them (should you ignore
    travis> them or let them ruin the rest of your calculation?)

This is the crux of the issue -- from a statistical perspective
(different from a numerical analyst's, from what I can  tell), it
would be important to flag different forms of missing data, in order
to process in different manners.  Using a single NAN does't allow for
this (i.e. numerical missingness, vs. statistical missingness, both of
which may be present depending on the data and the data analysis
algorithm using for processing.

best,
-tony

-- 
A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics		rossini at u.washington.edu	
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini at scharp.org
-------------- http://software.biostat.washington.edu/ ----------------
FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW:   Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(my tuesday/wednesday/friday locations are completely unpredictable.)






More information about the SciPy-Dev mailing list