Large Dictionaries

Chris Foote chris at foote.com.au
Mon May 15 23:13:30 EDT 2006


Roy Smith wrote:
> In article <1147699064.107490 at teuthos>, Chris Foote <chris at foote.com.au> 
> wrote:
> 
>> I have the need to store a large (10M) number of keys in a hash table,
>> based on a tuple of (long_integer, integer).  The standard python
>> dictionary works well for small numbers of keys, but starts to
>> perform badly for me inserting roughly 5M keys:
>>
>> # keys   dictionary  metakit   (both using psyco)
>> ------   ----------  -------
>> 1M            8.8s     22.2s
>> 2M           24.0s     43.7s
>> 5M          115.3s    105.4s
> 
> Are those clock times or CPU times?

User + system CPU time.

> How much memory is your process using and how much is available on your 
> machine?

A very large amount of space is consumed, which is why I didn't give 
times for the 10M version which would have eaten my 1G of RAM and
into swap :-)

> I'm guessing a integer takes 4 bytes and a long integer takes roughly one 
> byte per two decimal digits.  Plus a few more bytes to bundle them up into 
> a tuple.  You've probably got something like 20 bytes per key, so 5M of 
> them is 100 meg just for the keys.
> 
> To get reasonable hashing performance, the hash table needs to be maybe 
> half full, and figure a hash key is 4 bytes, so that's another 40 meg for 
> the hash table itself.
> 
> Plus whatever the values stored in the dictionary take up.  Even if you're 
> not storing any values (i.e., there's probably 4 bytes for a null pointer 
> (or reference to None), so that's another 40 meg.
> 
> These are all vague guesses, based on what I think are probably 
> conservative estimates of what various objects must take up in memory, but 
> you see where this is going.  We're already up to 180 meg.

Yep, that size sounds about right (my dictionary values are all None).

> I wonder if the 
> whole problem is that you're just paging yourself to death.  A few minutes 
> watching your system memory performance with ps or top while your program 
> is running might give you some idea if this is the case.

Definitely not.

Thanks anyway,

Chris




More information about the Python-list mailing list