Python Berkley DB and speed issues

Tue Sep 28 02:59:30 EDT 1999

Hi,

Was just wondering about how Python is when it comes to speed using it`s
built-in dictionaries. I`m converting a perl-project where a Berkley DB
hash-table stored several million entries about files stored on cd-roms.
The speed in the perl-project was ok, and it seems like Python has a
very similar approach to this method, using pickle- and shelve-modules (
or Berkley DB). My question is : will the speed drop as the data gets
larger, using pickle and shelve, or do I have to use Berkely DB? We`re
talking millions of entries here. The whole thing will be indexed, so
that you look up a seperate table for words appearing in file-entries
and get a list of keys back. It`s not a problem to use Berkley DB, but
if I don`t have to ... I`ll prefer to use the built-in stuff.

People advised me to use a bigger databasesystem instead of the built-in
stuff and Berkley DB, but it worked fine. Berkley DB is a standard
core-module with Python 1.5 isn`t it? Any experience with similar
problems?

If anybody has any input on how to solve/optimize this problem, I`ll
appreciate it. The whole thing is going to be a cd-indexing project
released under the GPL-licence, with advanced search, report and
scanning-features ( Allready got a working Perl-version, but want to
move it to Python as soon as possible :-> ).

Thomas Weholt