Large Dictionaries

Chris Foote chris at foote.com.au
Wed May 17 09:28:20 EDT 2006


Klaas wrote:
>> 22.2s  20m25s[3]
> 
> 20m to insert 1m keys?  You are doing something wrong.

Hi Mike.

I've put together some simplified test code, but the bsddb
module gives 11m for 1M keys:

Number generator test for 1000000 number ranges
         with a maximum of 3 wildcard digits.
Wed May 17 22:18:17 2006 dictionary population started
Wed May 17 22:18:26 2006 dictionary population stopped, duration 8.6s
Wed May 17 22:18:27 2006 StorageBerkeleyDB population started
Wed May 17 22:29:32 2006 StorageBerkeleyDB population stopped, duration 665.6s
Wed May 17 22:29:33 2006 StorageSQLite population started
Wed May 17 22:30:38 2006 StorageSQLite population stopped, duration 65.5s

test code is attached.

> With bdb's it is crucial to insert keys in bytestring-sorted order.

For the bsddb test, I'm using a plain string.  (The module docs list a
string being the only datatype supported for both keys & values).

> Also, be sure to give it a decent amount of cache.

The bsddb.hashopen() factory seems to have a bug in this regard; if you
supply a cachesize argument, then it barfs:

...
   File "bsddb-test.py", line 67, in runtest
     db = bsddb.hashopen(None, flag='c', cachesize=8192)
   File "/usr/lib/python2.4/bsddb/__init__.py", line 288, in hashopen
     if cachesize is not None: d.set_cachesize(0, cachesize)
bsddb._db.DBInvalidArgError: (22, 'Invalid argument -- DB->set_cachesize: method not permitted when environment 
specified')


I'll file a bug report on this if it isn't already fixed.

Cheers,
Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bsddb-test.py
Type: text/x-python
Size: 3387 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20060517/84f0df16/attachment.py>


More information about the Python-list mailing list