fastest way to merge lists.

Roy Smith roy at popmail.med.nyu.edu
Sat Jan 8 21:25:59 EST 2000


I've got 5 (unsorted) lists of strings.  Each list has O(10^4) strings 
in it.  Within each list, each string appears exactly once.  Also, there 
is a large overlap between the lists.

I want to form a single list which contains exactly one copy of each 
string from the combined lists.  In unix, this would be:

cat list[12345] | sort | uniq

I'm not yet sure if care if the final list is sorted or not.  For now, I 
think it doesn't matter.  What's the fastest way to do that?  It seems 
to me:

d = {}
for s in list1:
   d[s] = 1
for s in list2:
   d[s] = 1
for s in list3:
   d[s] = 1
for s in list4:
   d[s] = 1
for s in list5:
   d[s] = 1
list = d.keys()
del d   # if I care about memory

would be pretty good.  Anything better?

Now, if I want it sorted, I could just sort the above list.

I suspect that if I knew ahead of time I wanted the final list sorted, 
sorting each individual list then doing a linear merge might make more 
sense, if there was little overlap between the lists.  But, since I know 
they overlap greatly (i.e. len(list) will be approximately the same as 
len(listN)), I'm guessing doing it as described above and sorting the 
result might just work out faster.



More information about the Python-list mailing list