|Title:||Attribute Access Handlers|
|Author:||paul at prescod.net (Paul Prescod)|
It is possible (and even relatively common) in Python code and in extension modules to "trap" when an instance's client code attempts to set an attribute and execute code instead. In other words, it is possible to allow users to use attribute assignment/ retrieval/deletion syntax even though the underlying implementation is doing some computation rather than directly modifying a binding.
This PEP describes a feature that makes it easier, more efficient and safer to implement these handlers for Python instances.
You have a deployed class that works on an attribute named "stdout". After a while, you think it would be better to check that stdout is really an object with a "write" method at the moment of assignment. Rather than change to a setstdout method (which would be incompatible with deployed code) you would rather trap the assignment and check the object's type.
You want to be as compatible as possible with an object model that has a concept of attribute assignment. It could be the W3C Document Object Model or a particular COM interface (e.g. the PowerPoint interface). In that case you may well want attributes in the model to show up as attributes in the Python interface, even though the underlying implementation may not use attributes at all.
A user wants to make an attribute read-only.
In short, this feature allows programmers to separate the interface of their module from the underlying implementation for whatever purpose. Again, this is not a new feature but merely a new syntax for an existing convention.
To make some attributes read-only:
class foo: def __setattr__( self, name, val ): if name=="readonlyattr": raise TypeError elif name=="readonlyattr2": raise TypeError ... else: self.__dict__["name"]=val
This has the following problems:
- The creator of the method must be intimately aware of whether somewhere else in the class hierarchy __setattr__ has also been trapped for any particular purpose. If so, she must specifically call that method rather than assigning to the dictionary. There are many different reasons to overload __setattr__ so there is a decent potential for clashes. For instance object database implementations often overload setattr for an entirely unrelated purpose.
- The string-based switch statement forces all attribute handlers to be specified in one place in the code. They may then dispatch to task-specific methods (for modularity) but this could cause performance problems.
- Logic for the setting, getting and deleting must live in __getattr__, __setattr__ and __delattr__. Once again, this can be mitigated through an extra level of method call but this is inefficient.
Special methods should declare themselves with declarations of the following form:
class x: def __attr_XXX__(self, op, val ): if op=="get": return someComputedValue(self.internal) elif op=="set": self.internal=someComputedValue(val) elif op=="del": del self.internal
Client code looks like this:
fooval=x.foo x.foo=fooval+5 del x.foo
Attribute references of all three kinds should call the method. The op parameter can be "get"/"set"/"del". Of course this string will be interned so the actual checks for the string will be very fast.
It is disallowed to actually have an attribute named XXX in the same instance as a method named __attr_XXX__.
An implementation of __attr_XXX__ takes precedence over an implementation of __getattr__ based on the principle that __getattr__ is supposed to be invoked only after finding an appropriate attribute has failed.
An implementation of __attr_XXX__ takes precedence over an implementation of __setattr__ in order to be consistent. The opposite choice seems fairly feasible also, however. The same goes for __del_y__.
There is a new object type called an attribute access handler. Objects of this type have the following attributes:
name (e.g. XXX, not __attr__XXX__) method (pointer to a method object)
In PyClass_New, methods of the appropriate form will be detected and converted into objects (just like unbound method objects). These are stored in the class __dict__ under the name XXX. The original method is stored as an unbound method under its original name.
If there are any attribute access handlers in an instance at all, a flag is set. Let's call it "I_have_computed_attributes" for now. Derived classes inherit the flag from base classes. Instances inherit the flag from classes.
A get proceeds as usual until just before the object is returned. In addition to the current check whether the returned object is a method it would also check whether a returned object is an access handler. If so, it would invoke the getter method and return the value. To remove an attribute access handler you could directly fiddle with the dictionary.
A set proceeds by checking the "I_have_computed_attributes" flag. If it is not set, everything proceeds as it does today. If it is set then we must do a dictionary get on the requested object name. If it returns an attribute access handler then we call the setter function with the value. If it returns any other object then we discard the result and continue as we do today. Note that having an attribute access handler will mildly affect attribute "setting" performance for all sets on a particular instance, but no more so than today, using __setattr__. Gets are more efficient than they are today with __getattr__.
The I_have_computed_attributes flag is intended to eliminate the performance degradation of an extra "get" per "set" for objects not using this feature. Checking this flag should have minuscule performance implications for all objects.
The implementation of delete is analogous to the implementation of set.
You might note that I have not proposed any logic to keep the I_have_computed_attributes flag up to date as attributes are added and removed from the instance's dictionary. This is consistent with current Python. If you add a __setattr__ method to an object after it is in use, that method will not behave as it would if it were available at "compile" time. The dynamism is arguably not worth the extra implementation effort. This snippet demonstrates the current behavior:
>>> def prn(*args):print args >>> class a: ... __setattr__=prn >>> a().foo=5 (<__main__.a instance at 882890>, 'foo', 5) >>> class b: pass >>> bi=b() >>> bi.__setattr__=prn >>> b.foo=5
Assignment to __dict__["XXX"] can overwrite the attribute access handler for __attr_XXX__. Typically the access handlers will store information away in private __XXX variables
An attribute access handler that attempts to call setattr or getattr on the object itself can cause an infinite loop (as with __getattr__) Once again, the solution is to use a special (typically private) variable such as __XXX.
The descriptor mechanism described in PEP 252 is powerful enough to support this more directly. A 'getset' constructor may be added to the language making this possible:
class C: def get_x(self): return self.__x def set_x(self, v): self.__x = v x = getset(get_x, set_x)
Additional syntactic sugar might be added, or a naming convention could be recognized.