Berkely Db. How to iterate over large number of keys "quickly"

Thu Aug 2 22:03:38 EDT 2007

On Aug 2, 1:42 pm, Ian Clark <icl... at mail.ewu.edu> wrote:
> lazy wrote:
> > I have a berkely db and Im using the bsddb module to access it. The Db
> > is quite huge (anywhere from 2-30GB). I want to iterate over the keys
> > serially.
> > I tried using something basic like
>
> > for key in db.keys()
>
> > but this takes lot of time. I guess Python is trying to get the list
> > of all keys first and probbaly keep it in memory. Is there a way to
> > avoid this, since I just want to access keys serially. I mean is there
> > a way I can tell Python to not load all keys, but try to access it as
> > the loop progresses(like in a linked list). I could find any accessor
> > methonds on bsddb to this with my initial search.
> > I am guessing BTree might be a good choice here, but since while the
> > Dbs were written it was opened using hashopen, Im not able to use
> > btopen when I want to iterate over the db.
>
> db.iterkeys()
>
> Looking at the doc for bsddb objects[1] it mentions that "Once
> instantiated, hash, btree and record objects support the same methods as
> dictionaries." Then looking at the dict documentation[2] you'll find the
> dict.iterkeys() method that should do what you're asking.
>
> Ian
>
> [1]http://docs.python.org/lib/bsddb-objects.html
> [2]http://docs.python.org/lib/typesmapping.html

Thanks. I tried using db.first and then db.next for subsequent keys.
seems to be faster. Thanks for the pointers