Merging sorted lists/iterators/generators into one stream of values...

Lasse Vågsæther Karlsen lasse at vkarlsen.no
Tue Oct 11 07:12:06 EDT 2005


<snip>

Another idea for this method would be that in some cases I noticed that 
it was useful to know which source each element would come from as well, 
as well as removing duplicates from the results.

For instance

s1 = [1, 3, 5, 7]
s2 = [2, 3, 4]

for k, s in merge_by_sort(s1, s2):
     print k, "from source", s

this would print:

1 from source 0
2 from source 1
3 from source 1
3 from source 0
4 from source 1
5 from source 0
7 from source 0

and the above list has 3 twice, so possibly:

1 from sources [0]
2 from sources [1]
3 from sources [0, 1]
4 from sources [1]
5 from sources [0]
7 from sources [0]

This latter one would be a slightly more heavy method as it would have 
to compare the N first elements of the list or heap to figure out what 
indices to yield as well.

However, the previous solution could be:

def merge_by_sort(*sources, **options):
     if "cmp" in options:
         comparison = options["cmp"]
     else:
         comparison = cmp

     iterables = []
     for index, source in enumerate(sources):
         try:
             source = iter(source)
             iterables.append([source.next(), index, source])
         except StopIteration:
             pass

     iterables.sort(cmp=comparison, key=lambda x: x[0], reverse=True)
     while iterables:
         yield iterables[-1][0], iterables[-1][1]
         try:
             iterables[-1][0] = iterables[-1][2].next()
             if len(iterables) > 1 and comparison(iterables[-1][0], 
iterables[-2][0]) > 0:
                 iterables.sort(comparison, key=lambda x: x[0], 
reverse=True)
         except StopIteration:
             iterables.pop(-1)

-- 
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:lasse at vkarlsen.no
PGP KeyID: 0x2A42A1C2



More information about the Python-list mailing list