Best dbm to use?

brianc at temple.edu brianc at temple.edu
Wed Sep 7 17:07:42 EDT 2005


I'm creating an persistant index of a large 63GB file
containing millions of peices of data. For this I would
naturally use one of python's dbm modules. But which is the
best to use?

The index would be created with something like this:
fh=open('file_to_index')
db=dbhash.open('file_to_index.idx')
for obj in fh:
    db[obj.name]=fh.tell()

The index should serve two purposes. Random access and
sequential stepped access. Random access could be dealt with
by the hash table ability for example:
fh.seek(db[name])
obj=fh.GetObj()

However, I may want to access the i'th element in the file.
Something like this:
fh.seek(db.GetElement(i))
obj=fh.GetObj()

This is where the hash table breaks down and a b-tree would
serve my purpose better. Is there a unified data structure
that I could use or am I doomed to maintaining two seperate
index's?

Thanks in advance for any help.

-Brian



More information about the Python-list mailing list