Large Dictionaries
Chris Foote
chris at foote.com.au
Tue May 16 09:29:01 EDT 2006
Claudio Grondi wrote:
> Chris Foote wrote:
>
>> However, please note that the Python bsddb module doesn't support
>> in-memory based databases - note the library documentation's[1] wording:
>>
>> "Files never intended to be preserved on disk may be created
>> by passing None as the filename."
>>
>> which closely mirrors the Sleepycat documentation[2]:
>>
>> "In-memory databases never intended to be preserved on disk
>> may be created by setting the file parameter to NULL."
>>
>> It does actually use a temporary file (in /var/tmp), for which
>> performance for my purposes is unsatisfactory:
>>
>> # keys dictionary metakit bsddb (all using psyco)
>> ------ ---------- ------- -----
>> 1M 8.8s 22.2s 20m25s[3]
>> 2M 24.0s 43.7s N/A
>> 5M 115.3s 105.4s N/A
>>
>> Cheers,
>> Chris
>>
>> [1] bsddb docs:
>> http://www.python.org/doc/current/lib/module-bsddb.html
>>
>> [2] Sleepycat BerkeleyDB C API:
>> http://www.sleepycat.com/docs/api_c/db_open.html
>>
>> [3] Wall clock time. Storing the (long_integer, integer) key in
>> string form "long_integer:integer" since bsddb doesn't support keys
>> that aren't integers or strings.
>
> I have to admit, that I haven't wrote any own code to actually test
> this, but if 20m25s for storing of a single MByte of strings in a
> database table index column is really what you are getting, I can't get
> rid of the feeling, that there is something elementary wrong with your
> way doing it.
Hi Claudio.
1M is one million, referring to the number of insertions of keys;
not a Megabyte. I'm sorry that you took it that way :-(
Berkeley DB is great for accessing data by key for things already
stored on disk (i.e. read access), but write performance for each
key-value pair is slow due to it being careful about flushing
writes to disk by default.
> Posting the code for your test cases appears to me to be the only option
> to see what is the reason for the mystery you are getting here (this
> will clarify also the other mysterious things considered by the posters
> to this thread up to now).
I agree that posting some test code would have proved useful, but
the code is big and has too many interdependencies on external
things (e.g. databases, threads & pyro RPC calls) to allow me
to separate out examples easily. But if you go back to my
original posting, I think my question was quite clear.
Best regards,
Chris
More information about the Python-list
mailing list