Augmented assignment again

Sun Sep 3 08:26:48 EDT 2000

On Wed, Aug 30, 2000 at 11:37:29PM +0200, Roeland Rengelink wrote:
> Thomas Wouters wrote:

> > As soon as you mix normal binary operators in there, you will get new
> > objects.

> The objective of the exercise was not to completely avoid the creation
> of new objects, just the superfluous creation. I.e. can we implement
> __add__(self, other) in terms of copies and augmented assignment in such
> a way that 
> 
> d = a+b+c
> 
> only results in one copy (of a), and not two (of a and the result of
> a+b)

The problem is that Python does not know (and currently cannot know) which
objects are 'superfluous'. The expression 'd = a+b+c' is split into 
"d = ((a+b)+c)", meaning that "a+b" is first calculated (call the result
'res'), and then 'res+c' is calculated, and then the result of that is
stored in 'd'. The "a+b" expression is executed without knowledge of what is
going to happen with the resulting object (which is precisely the reason why
something like 'a = a + 1' isn't the same as 'a += 1', in Python 2.0.)

Allowing for this kind of optimization *is* possible now that we have
in-place operations, but it would require substansive rewriting of all of
the grammar & bytecode-compile-code in current Python. Not something likely
to happen before Py3K, and probably not even then.

> My users will be astronomers that want to do

> reduced_data = (raw_data-bias)/flatfield

> where raw_data may be a stack of, say 30 2k x 4k images. They will not
> like an extraneous copy of 1 GB worth of data, and they will not read
> the manual.
> So, either I don't supply binary arithemtic operations (which they will
> not 
> like/understand), or I solve this.

If you don't care about 'raw_data' after the subtraction, the solution is
to educate your users. Make them write it like this:

raw_data -= bias
raw_data /= flatfield

You can force them to do this by not supplying the normal __sub__ and
__rsub__ operators, though you will force them to make explicit copies if
they want copies.

> Well, I give it one more try below. But note that this would have been
> trivial if we had an additional 'magic' method __assign__ (or
> __finalize__) which is called whenever an instance is (first) bound. I
> think this post shows that binding an instance can be a significant
> moment in an instance's lifetime, on par with creation (__init__) and
> deletion (__del__). So why not have a special class method that is
> called at these moments?

Because it would not be Pythonic ;) It's also bloody hard to see when an
object is "first bound". You can't see it by refcount, because any object
being stored can be referenced many times, but not actually stored anywhere
yet. You'd have to keep track of 'boundedness' all of a sudden, for all
objects in Python. And you need to make some hard decisions on what
constitutes a 'store' operation. What if you write your own container class,
which has a '__setitem__' that does *more* calculation on the object ? Would
you want it to work on the 'finalized' object, or on the 'temporary' one ?

You can 'hack' it by providing your own containers, but it's not a very nice
solution. I think the explicit act of finalizing, compared with both normal
binary and in-place operations, and proper education of the programmers
(because if they write Python, they *are* being programmers) should be
enough to work around this problem. If people refuse to read the
documentation, they'll pay the price in memory-use. If this isn't enough,
fall back to instance methods. (Or provide all of them, and let the users
choose.)

If the people you target can't handle Python, you shouldn't let them write
Python -- write your own little language instead ;)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!