Writing huge Sets() to disk

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Mon Jan 10 11:11:09 EST 2005


Hi,
  I have sets.Set() objects having up to 20E20 items,
each is composed of up to 20 characters. Keeping
them in memory on !GB machine put's me quickly into swap.
I don't want to use dictionary approach, as I don't see a sense
to store None as a value. The items in a set are unique.

  How can I write them efficiently to disk? To be more exact,
I have 20 sets. _set1 has 1E20 keys of size 1 character.

alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'A', 'Q', 'F', 'Y', 'W', 'K', 'R', 'H', 'D', 'E')
for aa1 in alphabet:
    # l = [aa1]
    #_set1.add(aa1)
    for aa2 in alphabet:
        # l.append(aa2)
        #_set2.add(''.join(l))
[cut]

  The reason I went for sets instead of lists is the speed,
availability of unique, common and other methods.
What would you propose as an elegant solution?
Actually, even those nested for loops take ages. :(
M.



More information about the Python-list mailing list