Large Dictionaries
Chris Foote
chris at foote.com.au
Mon May 15 23:13:30 EDT 2006
Roy Smith wrote:
> In article <1147699064.107490 at teuthos>, Chris Foote <chris at foote.com.au>
> wrote:
>
>> I have the need to store a large (10M) number of keys in a hash table,
>> based on a tuple of (long_integer, integer). The standard python
>> dictionary works well for small numbers of keys, but starts to
>> perform badly for me inserting roughly 5M keys:
>>
>> # keys dictionary metakit (both using psyco)
>> ------ ---------- -------
>> 1M 8.8s 22.2s
>> 2M 24.0s 43.7s
>> 5M 115.3s 105.4s
>
> Are those clock times or CPU times?
User + system CPU time.
> How much memory is your process using and how much is available on your
> machine?
A very large amount of space is consumed, which is why I didn't give
times for the 10M version which would have eaten my 1G of RAM and
into swap :-)
> I'm guessing a integer takes 4 bytes and a long integer takes roughly one
> byte per two decimal digits. Plus a few more bytes to bundle them up into
> a tuple. You've probably got something like 20 bytes per key, so 5M of
> them is 100 meg just for the keys.
>
> To get reasonable hashing performance, the hash table needs to be maybe
> half full, and figure a hash key is 4 bytes, so that's another 40 meg for
> the hash table itself.
>
> Plus whatever the values stored in the dictionary take up. Even if you're
> not storing any values (i.e., there's probably 4 bytes for a null pointer
> (or reference to None), so that's another 40 meg.
>
> These are all vague guesses, based on what I think are probably
> conservative estimates of what various objects must take up in memory, but
> you see where this is going. We're already up to 180 meg.
Yep, that size sounds about right (my dictionary values are all None).
> I wonder if the
> whole problem is that you're just paging yourself to death. A few minutes
> watching your system memory performance with ps or top while your program
> is running might give you some idea if this is the case.
Definitely not.
Thanks anyway,
Chris
More information about the Python-list
mailing list