[Python-ideas] PEP: Dict addition and subtraction

Guido van Rossum guido at python.org
Fri Mar 8 11:55:43 EST 2019


On Thu, Mar 7, 2019 at 9:12 PM Stephen J. Turnbull <
turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:

> Ka-Ping Yee writes:
>  > On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <rosuav at gmail.com> wrote:
>
>  > > But adding dictionaries is fundamentally *useful*. It is expressive.
>  >
>  > It is useful.  It's just that + is the wrong name.
>
> First, let me say that I prefer ?!'s position here, so my bias is made
> apparent.  I'm also aware that I have biases so I'm sympathetic to
> those who take a different position.
>

TBH, I am warming up to "|" as well.


> Rather than say it's "wrong", let me instead point out that I think
> it's pragmatically troublesome to use "+".  I can think of at least
> four interpretations of "d1 + d2"
>
> 1.  update
> 2.  multiset (~= Collections.Counter addition)
>

I guess this explains the behavior of removing results <= 0; it makes sense
as multiset subtraction, since in a multiset a negative count makes little
sense. (Though the name Counter certainly doesn't seem to imply multiset.)


> 3.  addition of functions into the same vector space (actually, a
>     semigroup will do ;-), and this is the implementation of
>     Collections.Counter
> 4.  "fiberwise" set addition (ie, of functions into relations)
>
> and I'm very jet-lagged so I may be missing some.
>
> There's also the fact that the operations denoted by "|" and "||" are
> often implemented as "short-circuiting", and therefore not
> commutative, while "+" usually is (and that's reinforced for
> mathematicians who are trained to think of "+" as the operator for
> Abelian groups, while "*" is a (possibly) non-commutative operator.  I
> know commutativity of "+" has been mentioned before, but the
> non-commutativity of "|" -- and so unsuitability for many kinds of
> dict combination -- hasn't been emphasized before IIRC.
>

I've never heard of single "|" being short-circuiting. ("||" of course is
infamous for being that in C and most languages derived from it.)

And "+" is of course used for many non-commutative operations in Python
(e.g. adding two lists/strings/tuples together). It is only *associative*,
a weaker requirement that just says (A + B) + C == A + (B + C). (This is
why we write A + B + C, since the grouping doesn't matter for the result.)

Anyway, while we're discussing mathematical properties, and since SETL was
briefly mentioned, I found an interesting thing in math. For sets, union
and intersection are distributive over each other. I can't type the
operators we learned in high school, so I'll use Python's set operations.
We find that A | (B & C) == (A | B) & (A | C). We also find that A & (B |
C) == (A & B) | (A & C).

Note that this is *not* the case for + and * when used with (mathematical)
numbers: * distributes over +: a * (b + c) == (a * b) + (a * c), but + does
not distribute over *: a + (b * c) != (a + b) * (a + c). So in a sense,
SETL (which uses + and * for union and intersection) got the operators
wrong.

Note that in Python, + and * for sequences are not distributive this way,
since (A + B) * n is not the same as (A * n) + (B * n). OTOH A * (n + m) ==
A * n + A * m. (Assuming A and B are sequences of the same type, and n and
m are positive integers.)

If we were to use "|" and "&" for dict "union" and "intersection", the
mutual distributive properties will hold.


> Since "|" (especially "|=") *is* suitable for "update", I think we
> should reserve "+" for some future commutative extension.
>

One argument is that sets have an update() method aliased to "|=", so this
makes it more reasonable to do the same for dicts, which also have a.
update() method, with similar behavior (not surprising, since sets were
modeled after dicts).


> In the spirit of full disclosure:
> Of these, 2 is already implemented and widely used, so we don't need
> to use dict.__add__ for that.  I've never seen 4 in the mathematical
> literature (union of relations is not the same thing).  3, however, is
> very common both for mappings with small domain and sparse
> representation of mappings with a default value (possibly computed
> then cached), and "|" is not suitable for expressing that sort of
> addition (I'm willing to say it's "wrong" :-).
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190308/f5f8f051/attachment-0001.html>


More information about the Python-ideas mailing list