fastest way to merge lists.
Roy Smith
roy at popmail.med.nyu.edu
Sat Jan 8 21:25:59 EST 2000
I've got 5 (unsorted) lists of strings. Each list has O(10^4) strings
in it. Within each list, each string appears exactly once. Also, there
is a large overlap between the lists.
I want to form a single list which contains exactly one copy of each
string from the combined lists. In unix, this would be:
cat list[12345] | sort | uniq
I'm not yet sure if care if the final list is sorted or not. For now, I
think it doesn't matter. What's the fastest way to do that? It seems
to me:
d = {}
for s in list1:
d[s] = 1
for s in list2:
d[s] = 1
for s in list3:
d[s] = 1
for s in list4:
d[s] = 1
for s in list5:
d[s] = 1
list = d.keys()
del d # if I care about memory
would be pretty good. Anything better?
Now, if I want it sorted, I could just sort the above list.
I suspect that if I knew ahead of time I wanted the final list sorted,
sorting each individual list then doing a linear merge might make more
sense, if there was little overlap between the lists. But, since I know
they overlap greatly (i.e. len(list) will be approximately the same as
len(listN)), I'm guessing doing it as described above and sorting the
result might just work out faster.
More information about the Python-list
mailing list