[Numpy-discussion] missing data discussion round 2

Matthew Brett matthew.brett at gmail.com
Wed Jun 29 15:47:01 EDT 2011


Hi,

On Wed, Jun 29, 2011 at 7:20 PM, Lluís <xscript at gmx.net> wrote:
> Mark Wiebe writes:
>
>> There seems to be a general idea that masks and NA bit patterns imply
>> particular differing semantics, something which I think is simply
>> false.
>
> Well, my example contained a difference (the need for the "skipna=True"
> argument) precisely because it seemed that there was some need for
> different defaults.
>
> Honestly, I think this difference breaks the POLA (principle of least
> astonishment).
>
>
> [...]
>> As far as I can tell, the only required difference between them is
>> that NA bit patterns must destroy the data. Nothing else. Everything
>> on top of that is a choice of API and interface mechanisms. I want
>> them to behave exactly the same except for that necessary difference,
>> so that it will be possible to use the *exact same Python code* with
>> either approach.
>
> I completely agree. What I'd suggest is a global and/or per-object
> "ndarray.flags.skipna" for people like me that just want to ignore these
> entries without caring about setting it on each operaion (or the other
> way around, depends on the default behaviour).
>
> The downside is that it adds yet another tweaking knob, which is not
> desirable...

Oh - dear - that would be horrible, if, depending on the tweak
somewhere in the distant past of your script, this:

>>> a = np.array([np.NA, 1.0], masked=True)
>>> np.sum(a)

could return either np.NA or 1.0...

Imagine someone twiddled the knob the other way and ran your script...

See you,

Matthew



More information about the NumPy-Discussion mailing list