[Numpy-discussion] Masking through generator arrays
Nathaniel Smith
njs at pobox.com
Fri May 11 19:39:26 EDT 2012
On Thu, May 10, 2012 at 7:23 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> That is one of my concerns about the "bit pattern" idea -- we've then
> created a new binary type that no other standard software understands
> -- that looks like a a lot of work to me to deal with, or even worse,
> ripe for weird, non-obvious errors in code that access that good-old
> char*.
Numpy supports a number of unusual binary data types, e.g. halfs and
datetimes, that aren't well supported by other standard software. As
Travis points out, no-one forces you to use them :-).
> So I'm happier with a mask implementation -- more memory, yes, but it
> seems more robust an easy to deal with with outside code.
Let's say we have a no-frills C function that we want to call, and
it's defined to use a mask:
void do_calcs(double * data, char * mask, int size);
To call this function from Cython, then in the mask NAs world we do
something like:
a = np.ascontiguousarray(a)
do_calcs(PyArray_DATA(a), PyArray_MASK(a), a.size)
OTOH in the bitpattern NA world, we do something like:
a = np.ascontiguousarray(a)
mask = np.isNA(a)
do_calcs(PyArray_DATA(a), PyArray_DATA(mask), a.size)
Of course there are various extra complexities that can come in here
depending on what you want to do if there are no NAs possible, whether
do_calcs can take a NULL mask pointer, if you're writing in C instead
of Cython then you need to use the C equivalent functions, etc. But
IMHO there's no fundamental reason why bitpatterns have to be much
more complex to deal with in outside code than masks, assuming a
properly helpful API. What can't be papered over at the API level are
the questions like, do you want to be able to "un-assign" NA to reveal
what used to be there before? That needs masks, for better or worse.
But I may well be missing something... does that address your concern,
or is there more to it?
-- Nathaniel
More information about the NumPy-Discussion
mailing list