Large Dictionaries

Chris Foote chris at foote.com.au
Tue May 16 05:08:45 EDT 2006


Paul McGuire wrote:
> "Claudio Grondi" <claudio.grondi at freenet.de> wrote in message
> news:e49va0$m72$1 at newsreader3.netcologne.de...
>> Chris Foote wrote:
>>> Hi all.
>>>
>>> I have the need to store a large (10M) number of keys in a hash table,
>>> based on a tuple of (long_integer, integer).  The standard python
>>> dictionary works well for small numbers of keys, but starts to
>>> perform badly for me inserting roughly 5M keys:
>>>
>>> # keys   dictionary  metakit   (both using psyco)
>>> ------   ----------  -------
>>> 1M            8.8s     22.2s
>>> 2M           24.0s     43.7s
>>> 5M          115.3s    105.4s
>>>
>>> Has anyone written a fast hash module which is more optimal for
>>> large datasets ?
>>>
>>> p.s. Disk-based DBs are out of the question because most
>>> key lookups will result in a miss, and lookup time is
>>> critical for this application.
>>>
>> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for
>> BerkeleyDB (\Python24\DLLs\_bsddb.pyd vers. 4.2.52) are included in the
>> standard Python 2.4 distribution.
>>
>> "Berkeley DB was  20 times faster  than other databases.  It has the
>> operational speed of  a main memory database, the startup and  shut down
>> speed of a  disk-resident database, and does not have the  overhead  of
>> a client-server inter-process communication."
>> Ray  Van Tassle,  Senior  Staff Engineer, Motorola
>>
>> Please let me/us know if it is what you are looking for.
>
> sqlite also supports an in-memory database - use pysqlite
> (http://initd.org/tracker/pysqlite/wiki) to access this from Python.

Hi Paul.

I tried that, but the overhead of parsing SQL queries is too high:

                dictionary  metakit sqlite[1]
                ----------  ------- ---------
1M numbers       8.8s       22.2s     89.6s
2M numbers      24.0s       43.7s    190.0s
5M numbers      115.3s     105.4s       N/A

Thanks for the suggestion, but no go.

Cheers,
Chris

[1] pysqlite V1 & sqlite V3.



More information about the Python-list mailing list