[Python-Dev] PyBench DictCreation (was Re: Performance compares)

Tim Peters tim.one@home.com
Fri, 18 May 2001 15:48:33 -0400


[Neil Schemenauer]
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.

I thought about that before.  The inlining appeals but the algorithm not
much:  the dict implementation *as is* loops over all the table entries too,
except that instead of starting with "i = 0" it starts (now) with "i = hash &
mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^=
poly"; and instead of giving up when "i >= length" it punts when finding an
entry with a null value.  Incrementing via ++i is certainly cheaper, except
that even when small, the hash table usually hits on the first try when the
key is present, so usually gets out before incrementing.

> Its not clear to me though if it would be a win.

Best guess is not.

> Assuming that interned strings are the most common key, a assocation
> table with four entries would take on average two pointer compares
> to look up a value.

Actually an average of 2.5 when the key is present and each key is equally
likely to be queried, and always 4 when the queried key is not present.  The
hash table has better expected stats on both counts, but needs 4 unused slots
too to achieve that.  The savings would be in memory for small dicts more
than in time (if at all).