[Numpy-discussion] Boolean binary '-' operator

Nathaniel Smith njs at pobox.com
Tue Jun 27 17:35:23 EDT 2017


On Jun 26, 2017 6:56 PM, "Charles R Harris" <charlesr.harris at gmail.com>
wrote:


> On 27 Jun 2017, 9:25 AM +1000, Nathaniel Smith <njs at pobox.com>, wrote:
>
I guess my preference would be:
> 1) deprecate +
> 2) move binary - back to deprecated-but-not-an-error
> 3) fix np.diff to use logical_xor when the inputs are boolean, since
> that seems to be what people expect
> 4) keep unary - as an error
>
> And if we want to be less aggressive, then a reasonable alternative would
> be:
> 1) deprecate +
> 2) un-deprecate binary -
> 3) keep unary - as an error
>
>
Using '+' for 'or' and '*' for 'and' is pretty common and the variation of
'+' for 'xor' was common back in the day because 'and' and 'xor' make
boolean algebra a ring, which appealed to mathematicians as opposed to
everyone else ;)


'+' for 'xor' and '*' for 'and' is perfectly natural; that's just + and *
in Z/2. It's not only a ring, it's a field! '+' for 'or' is much weirder;
why would you use '+' for an operation that's not even invertible? I guess
it's a semi-ring. But we have the '|' character right there; there's no
expectation that every weird mathematical notation will be matched in
numpy... The most notable is that '*' doesn't mean matrix multiplication.


You can see the same progression in measure theory where eventually
intersection and xor (symmetric difference) was replaced with union and
complement. Using '-' for xor is something I hadn't seen outside of numpy,
but I suspect it must be standard somewhere.  I would leave '*' and '+'
alone, as the breakage and inconvenience from removing them would be
significant.


'*' doesn't bother me, because it really does have only one sensible
behavior; even built-in bool() effectively uses 'and' for '*'.

But, now I remember... The major issue here is that some people want dot(a,
b) on Boolean matrices to use these semantics, right? Because in this
particular case it leads to some useful connections to the matrix
representation for logical relations [1]. So it's sort of similar to the
diff() case. For the basic operation, using '|' or '^' is fine, but there
are these derived operations like 'dot' and 'diff' where people have
different expectations.

I guess Juan's example of 'sum' is relevant here too. It's pretty weird
that if 'a' and 'b' are one-dimensional boolean arrays, 'a @ b' and 'sum(a
* b)' give totally different results.

So that's the fundamental problem: there are a ton of possible conventions
that are each appealing in one narrow context, and they all contradict each
other, so trying to shove them all into numpy simultaneously is messy.

I'm glad we at least seem to have succeeded in getting rid of unary '-',
that one was particularly indefensible in the context of everything else
:-). For the rest, I'm really not sure whether it's better to deprecate
everything and tell people to use specialized tools for specialized
purposes (e.g. add a 'logical_dot'), or to special case the high-level
operations people want (make 'dot' and 'diff' continue to work, but
deprecate + and -), or just leave the whole incoherent mish-mash alone.

-n

[1] https://en.wikipedia.org/wiki/Logical_matrix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170627/293d350b/attachment-0001.html>


More information about the NumPy-Discussion mailing list