Writing huge Sets() to disk
Bengt Richter
bokr at oz.net
Tue Jan 11 03:42:39 EST 2005
On Mon, 10 Jan 2005 17:11:09 +0100, =?ISO-8859-2?Q?Martin_MOKREJ=A9?= <mmokrejs at ribosome.natur.cuni.cz> wrote:
>Hi,
> I have sets.Set() objects having up to 20E20 items,
What notation are you using when you write 20E20?
IOW, ISTM 1E9 is a billion. So 20E20 would be 2000 billion billion.
Please clarify ;-)
>each is composed of up to 20 characters. Keeping
>them in memory on !GB machine put's me quickly into swap.
>I don't want to use dictionary approach, as I don't see a sense
>to store None as a value. The items in a set are unique.
>
> How can I write them efficiently to disk? To be more exact,
>I have 20 sets. _set1 has 1E20 keys of size 1 character.
>
>alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'A', 'Q', 'F', 'Y', 'W', 'K', 'R', 'H', 'D', 'E')
>for aa1 in alphabet:
> # l = [aa1]
> #_set1.add(aa1)
> for aa2 in alphabet:
> # l.append(aa2)
> #_set2.add(''.join(l))
>[cut]
>
> The reason I went for sets instead of lists is the speed,
>availability of unique, common and other methods.
>What would you propose as an elegant solution?
>Actually, even those nested for loops take ages. :(
If you will explain a little what you are doing with these set "items"
perhaps someone will think of another way to represent and use your data.
Regards,
Bengt Richter
More information about the Python-list
mailing list