[Python-Dev] PEP 203 Augmented Assignment

Guido van Rossum guido@beopen.com
Thu, 27 Jul 2000 00:59:15 -0500


I've been thinking a bit about this myself, and I think it's good idea
to show the bytecodes generated for the various cases, just to make
sure that we understand the semantics.  I'll just use +=, but the same
list applies to all 11 binary operators (**, *, /, %, +, -, |, &, ^,
<<, >>).

I'm making up opcodes -- the different variants of LOAD and STORE
don't matter.  On the right I'm displaying the stack contents after
execution of the opcode (push appends to the end).  I'm writing
'result' to indicate the result of the += operator.

  a += b

      LOAD a			[a]
      LOAD b			[a, b]
      AUGADD			[result]
      STORE a			[]

  a.attr += b

      LOAD a			[a]
      DUP			[a, a]
      GETATTR 'attr'		[a, a.attr]
      LOAD b			[a, a.attr, b]
      AUGADD			[a, result]
      SETATTR 'attr'		[]

  a[i] += b

      LOAD a			[a]
      DUP			[a, a]
      LOAD i			[a, a, i]
      DUP			[a, a, i, i]
      ROT3			[a, i, a, i]
      GETITEM			[a, i, a[i]]
      LOAD b			[a, i, a[i], b]
      AUGADD			[a, i, result]
      SETITEM			[]

I'm leaving the slice variant out; I'll get to that in a minute.

If the right hand side is more complicated than in the example, the
line 'LOAD b' is simply replaced by code that calculates the value of
the expression; this always ends up eventually pushing a single value
onto the stack, leaving anything below it alone, just like the 'LOAD
b' opcode.  Ditto for the index expression ('i' in the example).

Similarly, for the cases a.attr and a[i], if instead of a there's a
more complicated expression (e.g. sys.modules[name].foo().bar += 1)
the initial 'LOAD a' is replaced by code that loads the object on the
stack -- in this example, sys.modules[name].foo().  Only the final
selector (".attr", "[i]") is special.  (Terminology: a selector is
something that addresses a possibly writable component of a container
object, e.g. a[i] or a.attr; a[i:j] is also a selector.  f() could be
seen as a selector but cannot be used on the left hand side of an
assignment.)

There are two more forms of potential interest.  First, what should
happen to a tuple assignment?

  a, b, c += x

(Which is exactly the same as "(a, b, c) += x".)

I think this should be a compile-time error.  If t and u are tuples,
"t += u" means the same as "t = t+u"; but if we apply this rule we
would get "(a, b, c) = (a, b, c) + u", which is only valid if u is an
empty tuple (or a class instance with unusual coercion behavior).  But
when u is empty, it's not useful (nothing changes), so it's unlikely
that someone would have this intention.  More likely, the programmer
was hoping that this would be the same as "a+=x; b+=x; c+=x" -- but
that's the same misconception as expecting "a, b, c = 0" to mean "a =
b = c = 0" so we don't need to cater to it.

Second, what should happen to a slice assignment?  The basic slice
form is:

  a[i:j] += b

but there are others: Python's slice syntax allows an arbitrary
comma-separated sequence of single indexes, regular slices (lo:hi),
extended slices (lo:hi:step), and "ellipsis" tokens ('...') between
the square brackets.  Here's an extreme example:

  a[:, ..., ::, 0:10:2, :10:, 1, 2:, ::-1] += 1

First, let me indicate what code is generated for such a form when
it's used in a regular expression or assignment.  Any such form
*except* basic slices (a[i:j], a[:j], a[i:], and a[:]) is translated
into code that uses GETITEM or SETITEM with an index that is formed
from a simple translation of the actual expressions.

  - If there are two or more comma-separated values, the index is a
  tuple of the translations of the individual values.

  - An ellipsis ("...") is translated into the builtin object
  Ellipsis.

  - A non-slice is translated into itself.

  - A slice is translated into a "slice object", this is a built-in
  object representing the lower and upper bounds and step.  There is
  also a built-in function, slice(), taking 1-3 arguments in the same
  way as range().  Thus:

    - "lo:hi" is equivalent to slice(lo, hi);

    - "lo:hi:step" is equivalent to slice(lo, hi, step);

    - omitted values are replaced with None, so e.g. ":hi" is
    equivalent to slice(None, hi).

So, the extreme example above means exactly the same as a[x], where x
is a tuple with the following items:

  slice(None, None)
  Ellipsis
  slice(None, None, None)
  slice(0, 10, 2)
  slice(None, 10, None)
  1
  slice(2, None)
  slice(None, None, -1)

Why all this elaboration?  Because I want to use this to give a
standardized semantics even to basic slices.  If a[lo:hi:step] is
translated the same as a[slice(lo, hi, step)], then we can give
a[lo:hi] the same translation as a[slice(lo, hi)], and thus the slice
case for augmented assignment can generate the same code (apart from
the slice-building operations) as the index case.  Thus (writing
'slice' to indicate the slice object built from i and j):

  a[i:j] += b

      LOAD a			[a]
      DUP			[a, a]
      LOAD i			[a, a, i]			**
      LOAD j			[a, a, i, j]			**
      BUILD_SLICE 2		[a, a, slice]			**
      DUP			[a, a, slice, slice]
      ROT3			[a, slice, a, slice]
      GETITEM			[a, slice, a[slice]]
      LOAD b			[a, slice, a[slice], b]
      AUGADD			[a, slice, result]
      SETITEM			[]

Comparing this to the code for "a[i] += b", only the three lines
marked with ** are really different, and all that these do is to push
a single object representing the slice onto the stack.

I won't show the code for "a[i:j:k] += b" or for "a[i:j, k:l]", but
it's clear how these should be done.


Postscript (unrelated to augmented assignment)

It would be nice if the SLICE bytecodes were removed altogether and
instead slice() objects would be created for all slices, even basic
ones.  (I believe this was proposed in this list at some point.)  The
original SLICE opcodes were introduced in ancient times, when basic
slices were the only accepted slice syntax.

This would mean that all objects supporting slices would have to
support the *mapping* interface instead of (or in addition to) the
sequence interface; the mapping interface would have to determine
whether a getitem / setitem call was really a slice call and do the
right thing.

In particular, for backward compatibility, class instances could have
a mapping interface whose internal getitem function checks if the
argument is a slice object whose step is None and whose lo and hi are
None or integers; then if a __getslice__ method exists, it could call
that, in all other cases it could call __getitem__.

None of the other built-in objects that support slices would have to
be changed; the GETITEM opcode could notice that an object supports
the sequence interface but not the mapping interface, and then look
for a basic slice or an integer and do the right thing.

Problems with this are mostly related to the existing C API for
slices, like PySequence_GetSlice(), which propagate the various
restrictions.

Too-much-rambling-reduces-the-chance-of-useful-responses-ly,

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)