Writing huge Sets() to disk

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Mon Jan 10 11:41:52 EST 2005


Robert Brewer wrote:
> Martin MOKREJŠ wrote:
> 
>>  I have sets.Set() objects having up to 20E20 items,
>>each is composed of up to 20 characters. Keeping
>>them in memory on !GB machine put's me quickly into swap.
>>I don't want to use dictionary approach, as I don't see a sense
>>to store None as a value. The items in a set are unique.
>>
>>  How can I write them efficiently to disk?
> 
> 
> got shelve*?

I know about shelve, but doesn't it work like a dictionary?
Why should I use shelve for this? Then it's faster to use
bsddb directly and use string as a key and None as a value, I'd guess.

Even for that, note that even for data contained in _set11,
the index should be(could be) optimized for keysize 11.
There are no other record-sizes.

Similarly, _set15 has all keys of size 15. In the bsddb or anydbm
and other modules docs, I don't see how to optimize that. Without
this optimization, I think it would be even slower. And shelve
gives me exactly such, unoptimized, general index on dictionary.

Maybe I'm wrong, I'm just a beginner here.
Thanks
M.



More information about the Python-list mailing list