Berkely Db. How to iterate over large number of keys "quickly"

Ian Clark iclark at mail.ewu.edu
Thu Aug 2 16:42:32 EDT 2007


lazy wrote:
> I have a berkely db and Im using the bsddb module to access it. The Db
> is quite huge (anywhere from 2-30GB). I want to iterate over the keys
> serially.
> I tried using something basic like
> 
> for key in db.keys()
> 
> but this takes lot of time. I guess Python is trying to get the list
> of all keys first and probbaly keep it in memory. Is there a way to
> avoid this, since I just want to access keys serially. I mean is there
> a way I can tell Python to not load all keys, but try to access it as
> the loop progresses(like in a linked list). I could find any accessor
> methonds on bsddb to this with my initial search.
> I am guessing BTree might be a good choice here, but since while the
> Dbs were written it was opened using hashopen, Im not able to use
> btopen when I want to iterate over the db.
> 

db.iterkeys()

Looking at the doc for bsddb objects[1] it mentions that "Once 
instantiated, hash, btree and record objects support the same methods as 
dictionaries." Then looking at the dict documentation[2] you'll find the 
dict.iterkeys() method that should do what you're asking.

Ian

[1] http://docs.python.org/lib/bsddb-objects.html
[2] http://docs.python.org/lib/typesmapping.html




More information about the Python-list mailing list