Writing huge Sets() to disk

Adam DePrince adam at cognitcorp.com
Mon Jan 10 13:28:32 EST 2005


On Mon, 2005-01-10 at 11:11, Martin MOKREJŠ wrote:
> Hi,
>   I have sets.Set() objects having up to 20E20 items,
> each is composed of up to 20 characters. Keeping
> them in memory on !GB machine put's me quickly into swap.
> I don't want to use dictionary approach, as I don't see a sense
> to store None as a value. The items in a set are unique.

Lets be realistic.  Your house is on fire and you are remodeling the
basement.

Assuming you are on a 64 bit machine with full 64 bit addressing, your
absolute upper limit on the size of a set is 2^64, or
18446744073709551616 byte.  Your real upper limit is at least an order
of magnitude smaller.

You are asking us how to store 20E20, or 2000000000000000000000, items
in a Set.  That is still an order of magnitude greater than the number
of *bits* you can address.  Your desktop might not be able to enumerate
all of these strings in your lifetime, much less index and store them.

We might as well be discussing the number of angles that can sit on the
head of a pin.  Any discussion of a list vs Set/dict is a small micro
optimization matter dwarfed by the fact that there don't exist machines
to hold this data.  The consideration of Set vs. dict is an even less
important matter of syntactic sugar.

To me, it sounds like you are taking an AI class and trying to deal with
a small search space by brute force.  First, stop banging your head
against the wall algorithmically.  Nobody lost their job for saying NP
!= P.  Then tell us what you are tring to do; perhaps there is a better
way, perhaps the problem is unsolvable and there is a heuristic that
will satisfy your needs. 



Adam DePrince 





More information about the Python-list mailing list