Writing huge Sets() to disk

Bengt Richter bokr at oz.net
Tue Jan 11 03:42:39 EST 2005


On Mon, 10 Jan 2005 17:11:09 +0100, =?ISO-8859-2?Q?Martin_MOKREJ=A9?= <mmokrejs at ribosome.natur.cuni.cz> wrote:

>Hi,
>  I have sets.Set() objects having up to 20E20 items,
What notation are you using when you write 20E20?
IOW, ISTM 1E9 is a billion. So 20E20 would be 2000 billion billion.
Please clarify ;-)

>each is composed of up to 20 characters. Keeping
>them in memory on !GB machine put's me quickly into swap.
>I don't want to use dictionary approach, as I don't see a sense
>to store None as a value. The items in a set are unique.
>
>  How can I write them efficiently to disk? To be more exact,
>I have 20 sets. _set1 has 1E20 keys of size 1 character.
>
>alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'A', 'Q', 'F', 'Y', 'W', 'K', 'R', 'H', 'D', 'E')
>for aa1 in alphabet:
>    # l = [aa1]
>    #_set1.add(aa1)
>    for aa2 in alphabet:
>        # l.append(aa2)
>        #_set2.add(''.join(l))
>[cut]
>
>  The reason I went for sets instead of lists is the speed,
>availability of unique, common and other methods.
>What would you propose as an elegant solution?
>Actually, even those nested for loops take ages. :(

If you will explain a little what you are doing with these set "items"
perhaps someone will think of another way to represent and use your data.

Regards,
Bengt Richter



More information about the Python-list mailing list