Writing huge Sets() to disk

Robert Brewer fumanchu at amor.org
Mon Jan 10 13:14:24 EST 2005


Martin MOKREJŠ wrote:
> Robert Brewer wrote:
> > Martin MOKREJŠ wrote:
> > 
> >>  I have sets.Set() objects having up to 20E20 items,
> >>each is composed of up to 20 characters. Keeping
> >>them in memory on !GB machine put's me quickly into swap.
> >>I don't want to use dictionary approach, as I don't see a sense
> >>to store None as a value. The items in a set are unique.
> >>
> >>  How can I write them efficiently to disk?
> > 
> > 
> > got shelve*?
> 
> I know about shelve, but doesn't it work like a dictionary?
> Why should I use shelve for this? Then it's faster to use
> bsddb directly and use string as a key and None as a value, I'd guess.

If you're using Python 2.3, then a sets.Set *is* implemented with a dictionary, with None values. It simply has some extra methods to make it behave like a set. In addition, the Set class already has builtin methods for pickling and unpickling.

So it's probably faster to use bsddb directly, but why not find out by trying 2 lines of code that uses shelve? The time-consuming part of your quest is writing the timed test suite that will indicate which route will be fastest, which you'll have to do regardless.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list