[Python-Dev] Instance variable access and descriptors

Eyal Lotem eyal.lotem at gmail.com
Sat Jun 9 23:23:41 CEST 2007


Hi.

I was surprised to find in my profiling that instance variable access
was pretty slow.

I looked through the CPython code involved, and discovered something
that really surprises me.

Python, probably through the valid assumption that most attribute
lookups go to the class, tries to look for the attribute in the class
first, and in the instance, second.

What Python currently does is quite peculiar!
Here's a short description o PyObject_GenericGetAttr:

A. Python looks for a descriptor in the _entire_ mro hierarchy
(len(mro) class/type check and dict lookups).
B. If Python found a descriptor and it has both get and set functions
- it uses it to get the value and returns, skipping the next stage.
C. If Python either did not find a descriptor, or found one that has
no setter, it will try a lookup in the instance dict.
D. If Python failed to find it in the instance, it will use the
descriptor's getter, and if it has no getter it will use the
descriptor itself.


I believe the average costs of A are much higher than of C. Because
there is just 1 instance dict to look through, and it is also
typically smaller than the class dicts (in rare cases of worse-case
timings of hash lookups), while there are len(mro) dicts to look for a
descriptor in.

This means that for simple instance variable lookups, Python is paying
the full mro lookup price!

I believe that this should be changed, so that Python first looks for
the attribute in the instance's dict and only then through the dict's
mro.

This will have the following effects:

A. It will break code that uses instance.__dict__['var'] directly,
when 'var' exists as a property with a __set__ in the class. I believe
this is not significant.
B. It will simplify getattr's semantics. Python should _always_ give
precedence to instance attributes over class ones, rather than have
very weird special-cases (such as a property with a __set__).
C. It will greatly speed up instance variable access, especially when
the class has a large mro.

I think obviously the code breakage is the worst problem. This could
probably be addressed by a transition version in which Python warns
about any instance attributes that existed in the mro as descriptors
as well.

What do you think?


More information about the Python-Dev mailing list