Berkely Db. How to iterate over large number of keys "quickly"

marduk marduk at nbk.hopto.org
Thu Aug 2 16:01:27 EDT 2007


On Thu, 2007-08-02 at 19:43 +0000, lazy wrote:
> I have a berkely db and Im using the bsddb module to access it. The Db
> is quite huge (anywhere from 2-30GB). I want to iterate over the keys
> serially.
> I tried using something basic like
> 
> for key in db.keys()
> 
> but this takes lot of time. I guess Python is trying to get the list
> of all keys first and probbaly keep it in memory. Is there a way to
> avoid this, since I just want to access keys serially. I mean is there
> a way I can tell Python to not load all keys, but try to access it as
> the loop progresses(like in a linked list). I could find any accessor
> methonds on bsddb to this with my initial search.
> I am guessing BTree might be a good choice here, but since while the
> Dbs were written it was opened using hashopen, Im not able to use
> btopen when I want to iterate over the db.
> 

try instead

key = db.firstkey()
while key != None:
    # do something with db[key]
    key = db.nextkey(key)





More information about the Python-list mailing list