Large Dictionaries

Chris Foote chris at foote.com.au
Tue May 16 09:29:01 EDT 2006


Claudio Grondi wrote:
> Chris Foote wrote:
>
>> However, please note that the Python bsddb module doesn't support
>> in-memory based databases - note the library documentation's[1] wording:
>>
>>     "Files never intended to be preserved on disk may be created 
>> by      passing None as the filename."
>>
>> which closely mirrors the Sleepycat documentation[2]:
>>
>>     "In-memory databases never intended to be preserved on disk 
>> may         be created by setting the file parameter to NULL."
>>
>> It does actually use a temporary file (in /var/tmp), for which 
>> performance for my purposes is unsatisfactory:
>>
>> # keys   dictionary  metakit  bsddb  (all using psyco)
>> ------   ----------  -------  -----
>> 1M            8.8s     22.2s  20m25s[3]
>> 2M           24.0s     43.7s  N/A
>> 5M          115.3s    105.4s  N/A
>>
>> Cheers,
>> Chris
>>
>> [1] bsddb docs:
>>     http://www.python.org/doc/current/lib/module-bsddb.html
>>
>> [2] Sleepycat BerkeleyDB C API:
>>     http://www.sleepycat.com/docs/api_c/db_open.html
>>
>> [3] Wall clock time.  Storing the (long_integer, integer) key in 
>> string form "long_integer:integer" since bsddb doesn't support keys 
>> that aren't integers or strings.
 >
> I have to admit, that I haven't wrote any own code to actually test 
> this, but if 20m25s for storing of a single MByte of strings in a 
> database table index column is really what you are getting, I can't get 
> rid of the feeling, that there is something elementary wrong with your 
> way doing it.

Hi Claudio.

1M is one million, referring to the number of insertions of keys;
not a Megabyte.  I'm sorry that you took it that way :-(

Berkeley DB is great for accessing data by key for things already
stored on disk (i.e. read access), but write performance for each
key-value pair is slow due to it being careful about flushing
writes to disk by default.

> Posting the code for your test cases appears to me to be the only option 
> to see what is the reason for the mystery you are getting here (this 
> will clarify also the other mysterious things considered by the posters 
> to this thread up to now).

I agree that posting some test code would have proved useful, but
the code is big and has too many interdependencies on external
things (e.g. databases, threads & pyro RPC calls) to allow me
to separate out examples easily.  But if you go back to my
original posting, I think my question was quite clear.

Best regards,
Chris




More information about the Python-list mailing list