Lookup caching

MRAB google at mrabarnett.plus.com
Tue Dec 12 17:55:03 EST 2006


Andrea Griffini wrote:
> Gabriel Genellina wrote:
> > At Saturday 9/12/2006 23:04, Andrea Griffini wrote:
> >
> >> I implemented that crazy idea and seems working... in its
> >> current hacked state can still pass the test suite (exluding
> >
> > What crazy idea? And what is this supposed to do?
> >
> The idea is to avoid looking up constants several times
> into dictionaries that didn't change (e.g. builtins).
>
> Reading a bit about other optimization proposals I didn't
> find a similar one so I decided to invest some spare time
> in it. The idea is
>
> 1) Add a "timestamp" to dictionaries, so when a dictionary
>     is changed the timestamp gets updated
>
> 2) Store a cached lookup for constants; the cached lookup
>     is stored as a timestamp value and a naked pointer to
>     the result. The algorithm for the lookup of a given
>     constant is:
>
>             if ( <<cached_timestamp>> == d->timestamp)
>             {
>                x = <<cached_value>>;
>             }
>             else
>             {
>                x = PyDict_GetItem(d, key);
>                <<cached_timestamp>> = d->timestamp;
>                <<cached_value>> = x;
>             }
>
>     using a naked pointer is safe because it will be used
>     only if the dictionary wasn't touched, hence the value
>     is surely still alive.
>
> The original place I thought about where to store the
> cached lookup was the bytecode, however after reading
> python sources I resorted instead to a dedicated space
> inside the code object. The code for LOAD_GLOBAL uses
> something like
>
>     if (co->co_cachedtstamps[oparg] == d->timestamp)
>          ...
>
> i.e. I used an array indexed by the index of the co_name
> used for lookups.
>
> The patched code is currently working, however I found
> that while the hit/miss ratios are impressive (as I
> expected) the speedup is simply absent. Moreover there
> is no difference at all between paying for the timestamp
> handling and NOT using the cached lookups or instead
> paying AND using the cached lookups (!).
> Absurdely python on my PC runs faster if I in addition
> to the cached lookup code also leave in place the
> hit/miss statistics (a few static ints increment and
> a static atexit-ed output function).
> Also it made a lot of difference about where the
> timestamp was placed inside the dictobject structure...
>
> In addition to the not impressive results (my patched
> python now is just a bit *slower* than original one :-D)
> there is also another complication. The LOAD_GLOBAL
> actually needs TWO lookups, so I used two cached results
> (one for globals, one for builtins).
> The ideal solution however IMO would be in this case
> to have two timestamps and one cached value instead...
> (if neither dict was touched since last lookup then the
> result will be the cached one).
>
[snip]
What are you using for the timestamp? Are you calling a function to
read a timer?

If so, you could try something that's 'cheaper' like a modification
counter instead, ie a counter that's incremented each time the dict is
modified.




More information about the Python-list mailing list