Creating an object that can track when its attributes are modified

Fri Mar 8 13:50:58 EST 2013

On Wed, 06 Mar 2013 16:26:57 -0800, Ben Sizer wrote:

> On Thursday, 7 March 2013 00:07:02 UTC, Steven D'Aprano  wrote:
[...]
>> Actually I lie. I would guess that the simple, most obvious way is
>> faster: don't worry about storing what changed, just store
>> *everything*. But I could be wrong.
> 
> The use case I have is not one where that is suitable. It's not the
> snapshots that are important, but the changes between them.

I'm afraid that doesn't make much sense to me. You're performing 
calculations, and stuffing them into instance attributes, but you don't 
care about the result of the calculations, only how they differ from the 
previous result?

I obviously don't understand the underlying problem you're trying to 
solve.

>> Fortunately, Python development is rapid enough that you can afford to
>> develop this object the straightforward way, profile your application
>> to see where the bottlenecks are, and if it turns out that the simple
>> approach is too expensive, then try something more complicated.
> 
> I don't see a more straightforward solution to the problem I have than
> the one I have posted. I said that a system that took snapshots of the
> whole object and attempted to diff them would probably perform worse,
> but it would probably be more complex too, given the traversal and
> copying requirements.

Yes, and I said that your intuition of what will be fast and what will be 
slow is not necessarily trustworthy. Without testing, neither of us knows 
for sure.

Given the code you showed in the original post, I don't see that 
traversal and copying requirements are terribly complicated. You don't do 
deep-copies of attributes, so a shallow copy of the instance __dict__ 
ought to be enough. Assuming you have a well-defined "start processing" 
moment, just grab a snapshot of the dict, which will be fast, then do 
your calculations, then call get_changes:

    def snapshot(self):
        self._snapshot = self.__dict__.copy()

    def get_changes(self):
        sentinel = object()
        return dict( [ (k,v) for k,v in self.__dict__.iteritems() 
            if k == self._snapshot.get(k, sentinel) ] )

This doesn't support *deleting* attributes, but neither does your 
original version.

Obviously I don't know for sure which strategy is fastest, but since your 
version already walks the entire __dict__, this shouldn't be much slower, 
and has a good chance of being faster.

(Your version slows down *every* attribute assignment. My version does 
not.)

By the way, your original version describes the get_changes_and_clean() 
method as cleaning the dirty *flags*. But the implementation doesn't 
store flags. Misleading documentation is worse than no documentation.

But if you insist on the approach you've taken, you can simplify the 
__setattr__ method:

    def __setattr__(self, key, value):
        # If the first modification to this attribute, store the old value
        dirty = self._dirty_attributes
        if key not in dirty:
            dirty[key] = getattr(self, key, None)
        # Set the new value
        object.__setattr__(self, key, value)

You might try this (slightly) obfuscated version, which *could* be faster 
still, although I doubt it.

    def __setattr__(self, key, value):
        # If the first modification to this attribute, store the old value
        self._dirty_attributes.setdefault(key, getattr(self, key, None))
        # Set the new value
        object.__setattr__(self, key, value)

but if you really need to get every bit of performance, it's worth trying 
them both and seeing which is faster.

(P.S. I trust you know to use timeit for timing small code snippets, 
rather than rolling your own timing code?)

-- 
Steven