Writing huge Sets() to disk
Tim Peters
tim.peters at gmail.com
Mon Jan 10 15:45:22 EST 2005
[Martin MOKREJŠ]
> ...
>
> I gave up the theoretical approach. Practically, I might need up
> to store maybe those 1E15 keys.
We should work on our multiplication skills here <wink>. You don't
have enough disk space to store 1E15 keys. If your keys were just one
byte each, you would need to have 4 thousand disks of 250GB each to
store 1E15 keys. How much disk space do you actually have? I'm
betting you have no more than one 250GB disk.
...
[Istvan Albert]
>> On my system storing 1 million words of length 15
>> as keys of a python dictionary is around 75MB.
> Fine, that's what I wanted to hear. How do you improve the algorithm?
> Do you delay indexing to the very latest moment or do you let your
> computer index 999 999 times just for fun?
It remains wholly unclear to me what "the algorithm" you want might
be. As I mentioned before, if you store keys in sorted text files,
you can do intersection and difference very efficiently just by using
the Unix `comm` utiltity.
More information about the Python-list
mailing list