Lookup caching

Mon Dec 11 17:19:20 EST 2006

Gabriel Genellina wrote:
> At Saturday 9/12/2006 23:04, Andrea Griffini wrote:
> 
>> I implemented that crazy idea and seems working... in its
>> current hacked state can still pass the test suite (exluding
> 
> What crazy idea? And what is this supposed to do?
> 
The idea is to avoid looking up constants several times
into dictionaries that didn't change (e.g. builtins).

Reading a bit about other optimization proposals I didn't
find a similar one so I decided to invest some spare time
in it. The idea is

1) Add a "timestamp" to dictionaries, so when a dictionary
    is changed the timestamp gets updated

2) Store a cached lookup for constants; the cached lookup
    is stored as a timestamp value and a naked pointer to
    the result. The algorithm for the lookup of a given
    constant is:

            if ( <<cached_timestamp>> == d->timestamp)
            {
               x = <<cached_value>>;
            }
            else
            {
               x = PyDict_GetItem(d, key);
               <<cached_timestamp>> = d->timestamp;
               <<cached_value>> = x;
            }

    using a naked pointer is safe because it will be used
    only if the dictionary wasn't touched, hence the value
    is surely still alive.

The original place I thought about where to store the
cached lookup was the bytecode, however after reading
python sources I resorted instead to a dedicated space
inside the code object. The code for LOAD_GLOBAL uses
something like

    if (co->co_cachedtstamps[oparg] == d->timestamp)
         ...

i.e. I used an array indexed by the index of the co_name
used for lookups.

The patched code is currently working, however I found
that while the hit/miss ratios are impressive (as I
expected) the speedup is simply absent. Moreover there
is no difference at all between paying for the timestamp
handling and NOT using the cached lookups or instead
paying AND using the cached lookups (!).
Absurdely python on my PC runs faster if I in addition
to the cached lookup code also leave in place the
hit/miss statistics (a few static ints increment and
a static atexit-ed output function).
Also it made a lot of difference about where the
timestamp was placed inside the dictobject structure...

In addition to the not impressive results (my patched
python now is just a bit *slower* than original one :-D)
there is also another complication. The LOAD_GLOBAL
actually needs TWO lookups, so I used two cached results
(one for globals, one for builtins).
The ideal solution however IMO would be in this case
to have two timestamps and one cached value instead...
(if neither dict was touched since last lookup then the
result will be the cached one).

The complication is that a lot of lookups are done by
the LOAD_ATTR instead and thinking to the ideal solution
for new classes made my brain explode (mro, descriptor
and stuff...). It would be simple to do something for
classic classes, but would that be worth (i mean...
aren't those going to disappear ?).

Probably something can be done for using caches for
LOAD_ATTR for modules (to speed up a bit things like
math.sin or mod1.mod2.mod3.func).

Any suggestion is welcome...

Andrea