For Large Dictionaries Could One Use Separate Dictionaries Where Each Dictionary Covers an Interval of the Input Range?

skip at pobox.com skip at pobox.com
Tue May 16 20:39:28 EDT 2006


    Roy> If you're getting long hash chains, you're either using a bad hash
    Roy> function, or your table isn't big enough.

Sure.  The original poster said something about 10 million keys I think.
Unless the key distribution is amazingly fortuitous and the number of unique
values is small, the dictionary is going to have a huge memory footprint.

On my Mac laptop this code:

    >>> d = {}
    >>> for n in xrange(10**7):
    ...   d[n] = hex(n)
    ... 

yields a process with 647MB of VM.  If I trim that to 1000 unique values:

    >>> d = {}
    >>> for n in xrange(10**7):
    ...   d[n] = hex(n % 1000)
    ... 

I still wind up with 647MB VM.  The dictionary and its keys are what consume
all the memory, and those keys are as simple as you can get.

Skip




More information about the Python-list mailing list