Merging sorted lists/iterators/generators into one stream of values...

Alex Martelli aleax at mail.comcast.net
Mon Oct 10 02:05:47 EDT 2005


George Sakkis <gsakkis at rutgers.edu> wrote:
   ...
> > manipulation of a heap to place an item in the right spot, but with 4-5
> > or a few more sources might not make an impact at all.
> 
> Unless you're talking about hundreds or thousands sources, it probably
> won't. I would still go for the heap solution since IMO the resulting
> code it's more readable and easier to understand.

I'm not so sure about either sentence...:

Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
--how=S
Best time for 10 loops: 0.247116088867

Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
--how=H
Best time for 10 loops: 0.10344004631

i.e., a heap solution may be over 4 times faster than a sort-based one
(in the following implementations).  Readability seems quite comparable
(skipping the rest of the infrastructure, which generates random sorted
streams and ensures a stream is exhausted and verifies it etc etc):

def merge_by_sort(streams):
  sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
  while sources:
    sources.sort(reverse=True)
    best_source = sources[-1]
    yield best_source[0]
    try: best_source[0] = best_source[-1]()
    except StopIteration: sources.pop()

def merge_by_heap(streams):
  sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
  heapq.heapify(sources)
  while sources:
    best_source = sources[0]
    yield best_source[0]
    try: best_source[0] = best_source[-1]()
    except StopIteration: heapq.heappop(sources)
    else: heapq.heapreplace(sources, best_source)


Hmmm, I wonder if something like merge_by_heap would be a good candidate
for itertool.  Raymond...?


Alex



More information about the Python-list mailing list