a question on python dict

Tim Peters tim.peters at gmail.com
Thu Dec 21 02:24:31 EST 2006


[could.net at gmail.com]
> Python dict is a hash table, isn't it?

Yup.

> I know that hashtable has the concept of "bucket size" and "min bucket
> count" stuff,

Some implementations of hash tables do.  Python's does not.  Python's
uses what's called "open addressing" instead.

> and they should be configurable so I can set them to the proper value
> when I know how many items I'm going to handle.
>
> If these two values can't be set, the hashtable will give them default
> values. When there are more and more items being added to the
> hashtable, it increase its buckets and copy the old items into the new
> one and re-calculate the hash value of each item.

That's several distinct claims, each of which is true of some hash
table implementations but not of others.  Python's implementation has
no "buckets", so all that's simply irrelevant.  Python's
implementation (not all do) saves the hash value of each item, so when
it does need to grow (or shrink) the space allocated to the table, it
does not need to recalculate the hash value of any item.

You should also note that copying a dict key or value (no matter of
what type) consists in its entirety of copying one machine address (a
4- or 8-byte pointer, depending on platform).

> I think this will waste some time doing the increasing-copying thing.

It does take time to resize.  "Waste" is a value judgment ;-)

> If I know I'm going to handle about 20000 items, I can set the size of
> hashtable to 30000.
>
> So, can I do this in python?

You can't.  Unless you build up to 20000 items over and over (many
thousands of times), it's unlikely you'd be able to measure the time
difference even if you could.



More information about the Python-list mailing list