[Python-Dev] Speeding up instance attribute access

Guido van Rossum guido@python.org
Fri, 08 Feb 2002 13:09:27 -0500


Inspired by the second half of Jeremy's talk on DevDay, here's my
alternative approach for speeding up instance attribute access.  Like
my idea for globals, it uses double indirection rather than
recompilation.


- We only care about attributes of 'self' (which is identified as the
  first argument of a method, not by name).  We can exclude functions
  from our analysis that make any assignment to self -- this is
  extremely rare and would throw off our analysis.  We should also
  exclude static methods and class methods, since their first argument
  doesn't have the same role.

- Static analysis of the source code of a class (without access to the
  base class) can determine attributes of the class, and to some extent
  instance variables.  Without also analyzing the base classes, this
  analysis cannot reliably distinguish between instance variables and
  methods inherited from a base class; it can distinguish between
  instance variables and methods defined in the current class.

- We can guess the status of un-assigned-to inherited attributes by
  seeing whether they are called or not.  This is not 100% accurate,
  so we need things to work (if slower) even when we guess wrong.

- For instance variable references and stores of the form self.<name>,
  the bytecode compiler emits opcodes LOAD_SELF_IVAR <i> and
  STORE_SELF_IVAR <i>, where <i> is a small int identifying the
  instance variable (ivar).  A particular ivar is identified by the
  same <i> throughout all methods defined in the same class statement,
  but there is no attempt to coordinate this across different classes
  related by inheritance.

- It would be nice if we also had a single-opcode way to express a
  method call on self, e.g. CALL_SELF_METHOD <i>, <n>, <k> where <i>
  identifies the method like above, and <n> and <k> are the number of
  positional and keyword arguments.  Or maybe we should just have
  LOAD_SELF_METHOD <i> which may be able to skip looking in the
  instance dict.

- Some data structure describing the mapping from <i> to attribute
  name, and whether it's an ivar or a method, is produced by the
  compiler and stored in the class __dict__.  The function objects
  representing methods also contain a pointer to this data structure.
  (Or the code objects?  But it needs to be shared.  Details, details.)

- When a class object is created (at run-time), another data structure
  is created that accumulates the <i>-to-name mappings from that class
  and all its base classes.