[Python-ideas] Augmented assignment [was Re: Adding "+" and "+=" operators to dict]

Sun Feb 15 07:12:30 CET 2015

On Feb 14, 2015, at 21:40, Steven D'Aprano <steve at pearwood.info> wrote:

> On Sat, Feb 14, 2015 at 07:10:19PM -0800, Andrew Barnert wrote:
>> On Feb 14, 2015, at 17:30, Steven D'Aprano <steve at pearwood.info> wrote:
> [snip example of tuple augmented assignment which both succeeds and 
> fails at the same time]
> 
>>>> I have argued that this never would have come up if augmented assignment
>>>> were only used for in-place operations,
>>> 
>>> And it would never happen if augmented assignment *never* was used for 
>>> in-place operations. If it always required an assignment, then if the 
>>> assignment failed, the object in question would be unchanged.
>>> 
>>> Alas, there's no way to enforce the rule that __iadd__ doesn't modify 
>>> objects in place, and it actually is a nice optimization when they can 
>>> do so.
>> 
>> No, it's not just a nice optimization, it's an important part of the 
>> semantics. The whole point of being able to share mutable objects is 
>> being able to mutate them in a way that all the sharers see.
> 
> Sure. I didn't say that it was "just" (i.e. only) a nice optimization. 
> Augmented assignment methods like __iadd__ are not only permitted but 
> encouraged to perform changes in place.
> 
> As you go on to explain, those semantics are language design choice. Had 
> the Python developers made different choices, then Python would 
> naturally be different. But given the way Python works, you cannot 
> enforce any such "no side-effects" rule for mutable objects.
> 
> 
>> If __iadd__ didn't modify objects in-place, you'd have this:
>> 
>>    py> a, b = [], []
>>    py> c = [a, b]
>>    py> c[1] += [1]
>>    py> b
>>    []
> 
> Correct. And that's what happens if you write c[1] = c[1] + [1]. If you 
> wanted to modify the object in place, you could write c[1].extend([1]) 
> or c[1].append(1).
> 
> The original PEP for this feature:
> 
> http://legacy.python.org/dev/peps/pep-0203/
> 
> lists two rationales:
> 
> "simplicity of expression, and support for in-place operations". It 
> doesn't go into much detail for the reason *why* in-place operations are 
> desirable, but the one example given (numpy arrays) talks about avoiding 
> needing to create a new object, possibly running out of memory, and then 
> "possibly delete the original object, depending on reference count". To 
> me, saving memory sounds like an optimization :-)
> 
> But of course you are correct Python would be very different indeed if 
> augmented assignment didn't allow mutation in-place.
> 
> 
> [...]
>>> I wonder if we can make this work more clearly if augmented assignments 
>>> checked whether the same object is returned and skipped the assignment 
>>> in that case?
>> 
>> I already answered that earlier. There are plenty of objects that 
>> necessarily rely on the assumption that item/attr assignment always 
>> means __setitem__/__setattr__.
>> 
>> Consider any object with a cache that has to be invalidated when one 
>> of its members or attributes is set. Or a sorted list that may have to 
>> move an element if it's replaced. Or really almost any case where a 
>> custom __setattr__, an @property or custom descriptor, or a 
>> non-trivial __setitem__ is useful. All of there would break.
> 
> Are there many such objects where replacing a member with itself is a 
> meaningful change?

Where replacing a member with a mutated version of the member is meaningful, sure.

Consider a sorted list again. Currently, it's perfectly valid to do sl[0] += x, because sl gets a chance to move the element, while it's not valid (according to the contract of sortedlist) to call sl[0].extend(x). This is relatively easy to understand and remember, even if it may seem odd to someone who doesn't understand Python assignment under the covers. (I could dig up a few StackOverflow questions where something like "use sl[0] += x" is the accepted answer--although sadly few if any of them explain _why_ there's a difference...)

Obviously if we changed += to be the same as update, people could find _different_ answers. For example, you can always write "tmp = sl[0]; tmp.extend(x); sl[0] = tmp" to explicitly reintroduce assignment where we've taken it away. But would you suggest that's an improvement?

You could (and do) argue that any class that relies on assignment to maintain sorting, flush caches, update proxies, whatever is a badly-designed class, and this feature has turned out to be an attractive nuisance. (To be honest, the fact that a dict can effectively guarantee that its keys won't change by requiring hashabllity, whole a sorted container has to document "please don't change the keys", is itself a wart on the language--but not one I'd suggest fixing.) But this really is a more significant change than you're making out.

> E.g. would the average developer reasonably expect that as a deliberate 
> design feature, spam = spam should be *guaranteed* to not be a no-op? I 
> know that for Python today, it may not be a no-op, if spam is an 
> expression such as foo.bar or foo[bar], and I'm not suggesting that it 
> would be reasonable to change the compiler semantics so that "normal" = 
> binding should skip the assignment when both sides refer to the same 
> object.
> 
> But I wonder whether *augmented assignment* should do so. I don't do 
> this lightly, but only to fix a wart in the language. See below.

This is a wart that's been there since the feature was added in the early 2.x days, and most people don't even run into it until they've been using Python for years. (Look at how many experienced Python devs in the previous thread were surprised, because they'd never run into it.) And the workaround is generally trivial: don't use assignment (in the FAQ case, and yours, just call extend instead). And it's easy to understand once you think about the semantics.

Creating a much larger, and harder-to-work-around, problem in an unknown number of libraries (and apps that use those libraries) to fix a small wart like this doesn't seem like a good trade.

And of course the _real_ answer in almost all cases is even simpler than the workaround: don't try to assign to objects inside immutable containers. Most people already know not to do this in the first place, which is why they rarely if ever run into the case where it half-works--and when they do, either the += was a mistake, or using a tuple instead of a list was a mistake, and the exception is sufficient to show them their mistake. 

I suppose it's _possible_ someone has written some code that dealt with the exception and then proceeded to work on bad data, but I don't think it's very likely. And that's really the only code that's affected by the wart.

>> What you're essentially proposing is that augmented assignment is no 
>> longer really assignment, so classes that want to manage assignment in 
>> some way can't manage augmented assignment.
> 
> I am suggesting that perhaps we should rethink the idea that augmented 
> assignment is *unconditionally* a form of assignment. We're doing this 
> because the current behaviour breaks under certain circumstances. If it 
> simply raised an exception, that would be okay, but the fact that the 
> operation succeeds and yet still raises an exception, that's pretty bad.
> 
> In the case of mutable objects inside immutable ones, we know the 
> augmented operation actually doesn't require there to be an assignment, 
> because the mutation succeeds even though the assignment fails. Here's 
> an example again, for anyone skimming the thread or has gotten lost:
> 
> t = ([1, 2], None)
> t[0] += [1]
> 
> 
> The PEP says:
> 
>    The __iadd__ hook should behave similar to __add__,
>    returning the result of the operation (which could be `self')
>    which is to be assigned to the variable `x'.
> 
> I'm suggesting that if __iadd__ returns self, the assignment be skipped. 
> That would solve the tuple case above. You're saying it might break code 
> that relies on some setter such as __setitem__ or __setattr__ being 
> called. E.g.
> 
> myobj.spam = []  # spam is a property
> myobj.spam += [1]  # myobj expects this to call spam's setter
> 
> But that's already broken, because the caller can trivially bypass the 
> setter for any other in-place mutation:
> 
> myobj.spam.append(1)
> myobj.spam[3:7] = [1, 2, 3, 4, 5]
> del myobj.spam[2]
> myobj.spam.sort()

The fact that somebody can work around your contract doesn't mean that your design is broken. I can usually ttrivially bypass your read-only x property by setting _x instead; so what? And again, I can break just about every mutable sorted container ever written just by mutating the keys; that doesn't mean sorted containers are useless.

> etc. In other words, in the "cache invalidation" case (etc.), no real 
> class that directly exposes a mutable object to the outside world can 
> rely on a setter being called. It would have to wrap it in a proxy to 
> intercept mutator methods, or live with the fact that it won't be 
> notified of mutations.
> 
> I used to think that Python had no choice but to perform an 
> unconditional assignment, because it couldn't tell whether the operation 
> was a mutation or not. But I think I was wrong. If the result of 
> __iadd__ is self, then either the operation was a mutation, or the 
> assignment is "effectively" a no-op. (That is, the result of the op 
> hasn't changed anything.)
> 
> I say "effectively" a no-op in scare quotes because setters will 
> currently be called in this situation:
> 
> myobj.spam = "a"
> myobj.spam += ""  # due to interning, "a" + "" may be the same object
> 
> Currently that will call spam's setter, and the argument will be the 
> identical object as spam's current value. It may be that the setter is 
> rather naive, and it doesn't bother to check whether the new value is 
> actually different from the old value before performing its cache 
> invalidation (or whatever). So you are right that this change will 
> affect some code.
> 
> That doesn't mean we can't fix this. It just means we have to go through 
> a transition period, like for any other change to Python's semantics. 
> During the transition, you may need to import from __future__, or there 
> may be a warning, or both. After the transition, writing:
> 
> myobj.spam = myobj.spam
> 
> will still call the spam setter, always. But augmented assignment may 
> not, if __iadd__ returns the same object. (Not just an object with the 
> same value, it has to be the actual same object.)
> 
> I think that's a reasonable change to make, to remove this nasty gotcha 
> from the language.
> 
> For anyone relying on their cache being invalidated when it is touched, 
> even if the touch otherwise makes no difference, they just have to deal 
> with a slight change in the definition of "touched". Augmented 
> assignment won't work. Instead of using cache.time_to_live += 0 to cause 
> an invalidation, use cache.time_to_live = cache.time_to_live. Or better 
> still, provide an explicit cache.invalidate() method.

I didn't think of some of these cases. But that just shows that there are _more_ cases that would be broken by the change, which means it's _more_ costly. The fact that these additional cases are easier to work around than the ones I gave doesn't change that.