Large Dictionaries
Chris Foote
chris at foote.com.au
Tue May 16 05:08:45 EDT 2006
Paul McGuire wrote:
> "Claudio Grondi" <claudio.grondi at freenet.de> wrote in message
> news:e49va0$m72$1 at newsreader3.netcologne.de...
>> Chris Foote wrote:
>>> Hi all.
>>>
>>> I have the need to store a large (10M) number of keys in a hash table,
>>> based on a tuple of (long_integer, integer). The standard python
>>> dictionary works well for small numbers of keys, but starts to
>>> perform badly for me inserting roughly 5M keys:
>>>
>>> # keys dictionary metakit (both using psyco)
>>> ------ ---------- -------
>>> 1M 8.8s 22.2s
>>> 2M 24.0s 43.7s
>>> 5M 115.3s 105.4s
>>>
>>> Has anyone written a fast hash module which is more optimal for
>>> large datasets ?
>>>
>>> p.s. Disk-based DBs are out of the question because most
>>> key lookups will result in a miss, and lookup time is
>>> critical for this application.
>>>
>> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for
>> BerkeleyDB (\Python24\DLLs\_bsddb.pyd vers. 4.2.52) are included in the
>> standard Python 2.4 distribution.
>>
>> "Berkeley DB was 20 times faster than other databases. It has the
>> operational speed of a main memory database, the startup and shut down
>> speed of a disk-resident database, and does not have the overhead of
>> a client-server inter-process communication."
>> Ray Van Tassle, Senior Staff Engineer, Motorola
>>
>> Please let me/us know if it is what you are looking for.
>
> sqlite also supports an in-memory database - use pysqlite
> (http://initd.org/tracker/pysqlite/wiki) to access this from Python.
Hi Paul.
I tried that, but the overhead of parsing SQL queries is too high:
dictionary metakit sqlite[1]
---------- ------- ---------
1M numbers 8.8s 22.2s 89.6s
2M numbers 24.0s 43.7s 190.0s
5M numbers 115.3s 105.4s N/A
Thanks for the suggestion, but no go.
Cheers,
Chris
[1] pysqlite V1 & sqlite V3.
More information about the Python-list
mailing list