[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Sat Oct 29 14:43:15 EDT 2011

Hi,

On Fri, Oct 28, 2011 at 8:38 PM, Benjamin Root <ben.root at ou.edu> wrote:
> Matt,
>
> On Friday, October 28, 2011, Matthew Brett <matthew.brett at gmail.com> wrote:
>>
>>> Forget about rudeness or decision processes.
>>
>> No, that's a common mistake, which is to assume that any conversation
>> about things which aren't technical, is not important.   Nathaniel's
>> point is important.  Rudeness is important. The reason we've got into
>> this mess is because we clearly don't have an agreed way of making
>> decisions.  That's why countries and open-source projects have
>> constitutions, so this doesn't happen.
>
> Don't get me wrong. In general, you are right.  And maybe we all should
> discuss something to that effect for numpy.  But I would rather do that when
> there isn't such contention and tempers.

That's a reasonable point.

> As for allegations of rudeness, I believe that we are actually very close to
> consensus that I immediately wanted to squelch any sort of
> meta-meta-disagreements about who was being rude to who.  As a quick
> band-aide, anybody who felt slighted by me gets a drink on me at the next
> scipy conference.  From this point on, let's institute a 10 minute rule --
> write your email, wait ten minutes, read it again and edit it.

Good offer.  I make the same one.

>>> I will start by saying that I am willing to separate ignore and absent,
>>> but
>>> only on the write side of things.  On read, I want a single way to
>>> identify
>>> the missing values.  I also want only a single way to perform
>>> calculations
>>> (either skip or propagate).
>>
>> Thank you - that is very helpful.
>>
>> Are you saying that you'd be OK setting missing values like this?
>>
>>>>> a.mask[0:2] = False
>>
>
> Probably not that far, because that would be an attribute that may or may
> not exist.  Rather, I might like the idea of a NA to "always" mean absent
> (and destroys - even through views), and MA (or some other name) which
> always means ignore (and has the masking behavior with views). This makes
> specific behaviors tied distinctly to specific objects.

Ah - yes - thank you.  I think you and I at least have somewhere to go
for agreement, but, I don't know how to work towards a numpy-wide
agreement.  Do you have any thoughts?

>> For the read side, do you mean you're OK with this
>>
>>>>> a.isna()
>>
>> To identify the missing values, as is currently the case?  Or something
>> else?
>>
>
> Yes.  A missing value is a missing value, regardless of it being absent or
> marked as ignored.  But it is a bit more subtle than that.  I should just be
> able to add two arrays together and the "data should know what to do". When
> the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then
> I don't have to do much to prepare higher level funcs for missing data.
>
>> If so, then I think we're very close, it's just a discussion about names.
>>
>
> And what does ignore + absent equals. ;-)

ignore + absent == special_value_of_some_sort :)

Just joking,

See you,

Matthew