Writing huge Sets() to disk

Tim Peters tim.peters at gmail.com
Mon Jan 10 15:45:22 EST 2005


[Martin MOKREJŠ]
> ...
> 
> I gave up the theoretical approach. Practically, I might need up
> to store maybe those 1E15 keys.

We should work on our multiplication skills here <wink>.  You don't
have enough disk space to store 1E15 keys.  If your keys were just one
byte each, you would need to have 4 thousand disks of 250GB each to
store 1E15 keys.  How much disk space do you actually have?  I'm
betting you have no more than one 250GB disk.

...

[Istvan Albert]
>> On my system storing 1 million words of length 15
>> as keys of a python dictionary is around 75MB.

> Fine, that's what I wanted to hear. How do you improve the algorithm?
> Do you delay indexing to the very latest moment or do you let your
> computer index 999 999 times just for fun?

It remains wholly unclear to me what "the algorithm" you want might
be.  As I mentioned before, if you store keys in sorted text files,
you can do intersection and difference very efficiently just by using
the Unix `comm` utiltity.



More information about the Python-list mailing list