[Python-ideas] PEP: Dict addition and subtraction

Josh Rosenberg shadowranger+pythonideas at gmail.com
Tue Mar 5 18:48:41 EST 2019


On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <steve at pearwood.info> wrote:

> On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
>
> > I propose that the + sign merge two python dictionaries such that if
> > there are conflicting keys, a KeyError is thrown.
>
> This proposal is for a simple, operator-based equivalent to
> dict.update() which returns a new dict. dict.update has existed since
> Python 1.5 (something like a quarter of a century!) and never grown a
> "unique keys" version.
>
> I don't recall even seeing a request for such a feature. If such a
> unique keys version is useful, I don't expect it will be useful often.
>
>
I have one argument in favor of such a feature: It preserves concatenation
semantics. + means one of two things in all code I've ever seen (Python or
otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter
and numpy arrays)
2. Concatenation (where the result preserves all elements, in order,
including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 +
seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either*
pattern; the main proposal (making it equivalent to left.copy(), followed
by .update(right)) would have the left hand side would win on ordering, the
right hand side on values, and wouldn't preserve the length invariant of
concatenation. At least when repeated keys are rejected, most concatenation
invariants are preserved; order is all of the left elements followed by all
of the right, and no elements are lost.


>
> > This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
>
> One of the reasons for preferring + is that it is an obvious way to do
> something very common, while {**d1, **d2} is as far from obvious as you
> can get without becoming APL or Perl :-)
>
>
>From the moment PEP 448 published, I've been using unpacking as a more
composable/efficient form of concatenation, merging, etc. I'm sorry you
don't find it obvious, but a couple e-mails back you said:

"The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first."

Learning to use the unpacking syntax in the case of function calls is
necessary for tons of stuff (writing general function decorators, handling
initialization in class hierarchies, etc.), and as PEP 448 is titled, this
is just a generalization combining the features of unpacking arguments with
collection literals.

> The second syntax makes it clear that a new dictionary is being
> > constructed and that d2 overrides keys from d1.
>
> Only because you have learned the rule that {**d, **e) means to
> construct a new dict by merging, with the rule that in the event of
> duplicate keys, the last key seen wins. If you hadn't learned that rule,
> there is nothing in the syntax which would tell you the behaviour. We
> could have chosen any rule we liked:
>
>
No, because we learned the general rule for dict literals that {'a': 1,
'a': 2} produces {'a': 2}; the unpacking generalizations were very good
about adhering to the existing rules, so it was basically zero learning
curve if you already knew dict literal rules and less general unpacking
rules. The only part to "learn" is that when there is a conflict between
dict literal rules and function call rules, dict literal rules win.

To be clear: I'm not supporting + as raising error on non-unique keys. Even
if it makes dict + dict adhere to the rules of concatenation, I don't think
it's a common or useful functionality. My order of preferences is roughly:

1. Do nothing (even if you don't like {**d1, **d2}, .copy() followed by
.update() is obvious, and we don't need more than one way to do it)
2. Add a new method to dict, e.g. dict.merge (whether it's a class method
or an instance method is irrelevant to me)
3. Use | (because dicts are *far* more like sets than they are like
sequences, and the semi-lossy rules of unioning make more sense there); it
would also make - make sense, since + is only matched by - in numeric
contexts; on collections, | and - are paired. And I consider the -
functionality the most useful part of this whole proposal (because I *have*
wanted to drop a collection of known blacklisted keys from a dict and while
it's obvious you can do it by looping, I always wanted to be able to do
something like d1.keys() -= badkeys, and remain disappointed nothing like
it is available)

-Josh Rosenberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190305/f0556d64/attachment.html>


More information about the Python-ideas mailing list