[Python-ideas] Adding "+" and "+=" operators to dict

Andrew Barnert abarnert at yahoo.com
Fri Feb 13 04:40:23 CET 2015


On Feb 12, 2015, at 18:24, Steven D'Aprano <steve at pearwood.info> wrote:

> On Fri, Feb 13, 2015 at 12:06:44AM +0000, Andrew Barnert wrote:
>> On Thursday, February 12, 2015 1:33 PM, Nathan Schneider <neatnate at gmail.com> wrote:
>> 
>> 
>>> A reminder about PEP 448: Additional Unpacking Generalizations 
>>> (https://www.python.org/dev/peps/pep-0448/), which claims that "it 
>>> vastly simplifies types of 'addition' such as combining
>>> dictionaries, and does so in an unambiguous and well-defined way". 
>>> The spelling for combining two dicts would be: {**d1, **d2}
> 
> Very nice! That should be extensible to multiple arguments without the 
> pathologically slow performance of repeated addition:
> 
> {**d1, **d2, **d3, **d4}
> 
> and you can select which dict you want to win in the event of clashes 
> by changing the order.
> 
> 
>> I like that this makes all of the bikeshedding questions obvious: it's 
>> a dict display, so the result is clearly a dict rather than a 
>> type(d1), it's clearly going to follow the same ordering rules for 
>> duplicate keys as a dict display (d2 beats d1), and so on.
>> 
>> And it's nice that it's just a special case of a more general 
>> improvement.
>> 
>> However, it is more verbose (and more full of symbols), and it doesn't 
>> give you an obvious way to, say, merge two OrderedDicts into an 
>> OrderedDict.
> 
> Here's a more general proposal. The dict constructor currently takes 
> either a single mapping or a single iterable of (key,value) pairs. The 
> update method takes a single mapping, or a single iterable of (k,v) 
> pairs, AND any arbitrary keyword arguments.

The constructor also takes any arbitrary keyword items.

(In fact, if you look at the source, dict_init, just calls the same dict_update_common function that update calls. So it's as if you defined __init__(self, *args, **kwargs) and update(self, *args, **kwargs) to both just call self.__update_common(*args, **kwargs).)

> We should generalise both of these to take *one or more* mappings and/or 
> iterables, and solve the most common forms of copy-and-update.

I like this idea, but it's not perfect--see below.

> That 
> avoids the (likely) pathologically slow behaviour of repeated addition, 
> avoids any confusion over operators and having += duplicating the 
> update method.
> 
> Then, merging in place uses:
> 
> d1.update(d2, d3, d4, d5)
> 
> and copy-and-merge uses:
> 
> dict(d1, d2, d3, d4, d5)  # like d1+d2+d3+d4+d5
> 
> where the d's can be any mapping, and you can optionally include keyword 
> arguments as well.
> 
> You don't have to use dict, you can use any Mapping which uses the same 
> constructor semantics.

Except that every existing Mapping that anyone has developed so far doesn't use the same constructor semantics, and very few of them will automatically change to do so just because dict does. Basically, unless it subclasses or delegates to dict and either doesn't override __init__, or does so with *args, **kwargs and passes those along untouched, it won't work. And that's not very common.

For example, despite what you say below, OrderedDict (as currently written) will not work; it never passes the __init__ arguments to dict. Or UserDict; it passes them to MutableMapping instead, and only after checking that there's only 0 or 1 positional arguments. Or any of three third-party sorted dicts I looked at.

I think defaultdict actually will work (its __init__ slot pulls the first argument, then passes the rest to dict blindly, and its update is inherited from dict), as will the Python-equivalent recipe (which I'd guess is available on PyPI as a backport for 2.4, but I haven't looked).

So, this definitely is not as general a solution as adding a new method to dict and Mapping would be; it will only help dict, a handful of other classes.

Still, I like this idea.

It defines a nice idiom for dict-like constructors to follow--and classes written to that idiom will actually work out of the box in 3.4, and even 2.7. That's pretty cool.

And for people who want to use this with plain old dicts in 2.7 and 3.4, all they need to do is use your trivial dict subclass. (Of course that still won't let them call update with multiple args on a dict created as a literal or passed in from third-party code, but it will let them do the copy-and-update the same way as in 3.5, and I think that's more important than the update-with-multiple-args.)

It might be worth changing MutableMapping.update to handle the new signature, but that raises the question of whether that's a backward-incompatible change. And, because not all MutableMapping classes delegate __init__ to update the way OrderedDict does, it still doesn't solve every type (or even most types).

> That means subclasses of dict ought to work 
> (unless the subclass does something silly) and even OrderedDict ought to 
> work (modulo the fact that regular dicts and keyword args are 
> unordered).

I suppose it depends what you mean by "something silly", but I think it's pretty common for subclasses to define their own __init__, and sometimes update, and not pass the args through blindly.

More importantly, I don't think dict subclasses are nearly as common as mapping-protocol classes that delegate to or don't even touch a dict (whether subclasses of Mapping or not), so I don't think it would matter all that much if dict subclasses worked.

And again, OrderedDict will not work.

> Here's a proof of concept:
> 
> class Dict(dict):
>    def __init__(self, *mappings, **kwargs):
>        assert self == {}
>        Dict.update(self, *mappings, **kwargs)

If you want to ensure that overriding update doesn't affect __init__ (I'm not _sure_ that you do--yes, UserDict does so, but UserDict is trying to make sure it has exactly the same behavior as dict when subclassed, not trying to improve the behavior of dict), why not have both __init__ and update delegate to a private __update method (as UserDict does)? I'm not sure it makes much difference, but it seems like a more explicit way of saying "this will not use overridden behavior".

>    def update(self, *mappings, **kwargs):
>        for mapping in (mappings + (kwargs,)):
>            if hasattr(mapping, 'keys'):
>                for k in mapping:
>                    self[k] = mapping[k]
>            else:
>                for (k,v) in mapping:
>                    self[k] = v

The current dict constructor/update function iterates over mapping.keys(), not directly over mapping; that means it work with "hybrid" types that iterate like sequences (or something else) but also have keys/values/items. And of course it has special fast-path code for dealing with real dict objects. (Also see MutableMapping.update; it also iterates over mapping.keys() if hasattr(mapping, 'keys'), and it also has a fast-path iterate over mapping only for really with real Mapping classes.)

I think this would be simpler, more correct, and also probably much more efficient, for a pure-Python wrapper class:

    def update(self, *mappings, **kwargs):
        for mapping in (mappings + (kwargs,)):
            dict.update(self, mapping)

And for the real C definition, you're basically just moving the guts of dict_update_common (which are trivial) into a loop over the args instead of an if over a single arg.



More information about the Python-ideas mailing list