None versus MISSING sentinel -- request for design feedback
OKB (not okblacke)
brenNOSPAMbarn at NObrenSPAMbarn.net
Fri Jul 15 13:40:58 EDT 2011
Steven D'Aprano wrote:
> Rob Williscroft wrote:
>> MISSING = MissingObject()
>> def mean( sequence, missing = MISSING ):
>
> So you think the right API is to allow the caller to specify what
> counts as a missing value at runtime? Are you aware of any other
> statistics packages that do that?
R does it, not in the stats functions itself but in, for instance
read.table. When reading data from an external file, you can specify a
set of values that will be converted to NA in the resulting data frame.
I think it's worth considering this approach, namely separating the
input of the data into your system from the calculations on that
data. You haven't said exactly how people are going to be using your
API, but your example of "where mising data comes from" showed something
like a table of data from a survey. If this is the case, and users are
going to be importing sets of data from external files, it makes a lot
of sense to let them specify "convert these particular values to MISSING
when importing".
Either way, my answer to your original question would be: if you
want to err on the side of caution, use your own MISSING value and just
provide a simple function that will MISSING-ize specified values:
def ckeanUp(data, missing=None):
if missing is None:
missing = []
return [d for d in data if d not in missing else MISSING]
(Yet another use of None here! :-)
Then if people find their functions are returning None (or any
other value, such as an empty string) to mean a "genuine" missing value,
they can just wrap the call in this cleanUp function. The reverse is
harder to do: if you use None as your missing-value sentinel, you
irrevocably lose the ability to tell it apart from other uses of None.
--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown
More information about the Python-list
mailing list