Databases and python

Dan Stromberg strombrg at dcs.nac.uci.edu
Thu Feb 16 02:05:38 EST 2006


I've been putting a little bit of time into a file indexing engine in
python, which you can find here:
http://dcs.nac.uci.edu/~strombrg/pyindex.html

It'll do 40,000 mail messages of varying lengths pretty well now, but I
want more :)

So far, I've been taking the approach of using a single-table database
like gdbm or dbhash (actually a small number of them, to map filenames to
numbers, numbers to filenames, words to numbers, numbers to words,
and numbered words to numbered filenames), and making each entry keyed by
a word, and under the word in the database is a null terminated list of
filenames (in http://dcs.nac.uci.edu/~strombrg/base255.html representation).

However, despite using http://dcs.nac.uci.edu/~strombrg/cachedb.html module
to make the database use faster, bringing in psyco, and studying various
python optimization pages, the program just isn't performing like I'd like
it to.

And I think it's because despite the caching and minimal representation
conversion, it's still just too slow converting linear lists to arrays
back and forth and back and forth.

So this leads me to wonder - is there a python database interface that
would allow me to define a -lot- of tables?  Like, each word becomes a
table, and then the fields in that table are just the filenames that
contained that word.  That way adding filenames to a word shouldn't bog
down much at all.

-But-, are there any database interfaces for python that aren't going to
get a bit upset if you try to give them hundreds of thousands of tables?

Thanks!




More information about the Python-list mailing list