[Python-Dev] Simple dicts

Greg Ward gward@python.net
Thu, 15 May 2003 09:39:27 -0400


On 14 May 2003, Brett C. said:
> When I took data structures I was taught that chaining was actually the 
> easiest way to do hash tables and they still had good performance 
> compared to open addressing.  Because of this taught bias I always 
> wondered why Python used open addressing; can someone tell me?

If your nodes are small, chaining has a huge overhead -- an extra
pointer for each node in a chain.  You can play around with glomming
several nodes together to amortize the cost of those pointers, but ISTR
the win isn't that big.

Open addressing is more memory-efficient, but when the hash table fills
(or gets close to full), you absolutely positively have to rehash.

(Back in January, I played around with writing a custom hash table for
keeping ZODB indexes in memory without using a Python dict, so that's
why I'm fairly fresh on hash table minutiae.)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
NOBODY expects the Spanish Inquisition!