Merging sorted lists/iterators/generators into one stream of values...

George Sakkis gsakkis at rutgers.edu
Mon Oct 10 09:19:58 EDT 2005


"Alex Martelli" <aleax at mail.comcast.net> wrote:

> George Sakkis <gsakkis at rutgers.edu> wrote:
>    ...
> > > manipulation of a heap to place an item in the right spot, but with 4-5
> > > or a few more sources might not make an impact at all.
> >
> > Unless you're talking about hundreds or thousands sources, it probably
> > won't. I would still go for the heap solution since IMO the resulting
> > code it's more readable and easier to understand.
>
> I'm not so sure about either sentence...:
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=S
> Best time for 10 loops: 0.247116088867
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=H
> Best time for 10 loops: 0.10344004631
>
> i.e., a heap solution may be over 4 times faster than a sort-based one
> (in the following implementations).

Interesting; I thought timsort on small almost ordered lists would be practically as fast as the
heap. Still, how is 0.10344 over 4 times faster than 0.247116 ?

> Readability seems quite comparable
> (skipping the rest of the infrastructure, which generates random sorted
> streams and ensures a stream is exhausted and verifies it etc etc):
>
> def merge_by_sort(streams):
>   sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
>   while sources:
>     sources.sort(reverse=True)
>     best_source = sources[-1]
>     yield best_source[0]
>     try: best_source[0] = best_source[-1]()
>     except StopIteration: sources.pop()
>
> def merge_by_heap(streams):
>   sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
>   heapq.heapify(sources)
>   while sources:
>     best_source = sources[0]
>     yield best_source[0]
>     try: best_source[0] = best_source[-1]()
>     except StopIteration: heapq.heappop(sources)
>     else: heapq.heapreplace(sources, best_source)

Indeed, these are almost equivalent as far as readability goes; the previous examples in the thread
were less clear. By the way, why do you need 'i' and enumerate above ?

> Hmmm, I wonder if something like merge_by_heap would be a good candidate
> for itertool.  Raymond...?
>
>
> Alex

Yes, it would be nice to make it into 2.5.

George





More information about the Python-list mailing list