Are dictionaries the same as hashtables?

Diez B. Roggisch deets at nospam.web.de
Tue Aug 26 09:06:21 EDT 2008


John Machin wrote:

> On Aug 26, 7:36 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>> Martin Marcher wrote:
>> > On 2008-08-26 00:32:20, cnb wrote:
>> >> Are dictionaries the same as hashtables?
>>
>> > Yes, but there is nothing in there that does sane collision handling
>> > like making a list instead of simply overwriting.
>>
>> The term "collision" is rather well defined when talking about
>> associative arrays using hashing (which python's dicts are): it means
>> that two distinct keys produce the same hash-value, leading to a bucket
>> collision. And of course python deals with that sanely, and creates a
>> list of objects inside the bucket which is traversed and comparison is
>> used to determine which actual value to retrieve.
> 
> I see neither buckets nor lists in the CPython svn trunk version of
> dictobject.c and IIRC it's been using open addressing for a very long
> time.

I don't know the exact names of the involved structures - I named them
liberally from my understanding of how associative arrays based on hashing
are implemented. But the below code shows that hash-collisions can occur
without corrupting data, while OTOH of course degenerating the lookup from
usually O(1) to O(n). Try playing around with it, commenting out the proper
hashing of the key will lead to great performance enhancements.

Diez

import pprint

class StupidThing(object):

    def __init__(self, key):
        self.key = key


    def __hash__(self):
        #return hash(self.key)
        return 1


    def __cmp__(self, other):
        return cmp(self.key, other.key)


    def __repr__(self):
        return "<%s>" % self.key


d = {}

for a in xrange(1000):
    d[StupidThing(str(a))] = a



pprint.pprint(d)




More information about the Python-list mailing list