Merging sorted lists/iterators/generators into one stream of values...
George Sakkis
gsakkis at rutgers.edu
Mon Oct 10 09:19:58 EDT 2005
"Alex Martelli" <aleax at mail.comcast.net> wrote:
> George Sakkis <gsakkis at rutgers.edu> wrote:
> ...
> > > manipulation of a heap to place an item in the right spot, but with 4-5
> > > or a few more sources might not make an impact at all.
> >
> > Unless you're talking about hundreds or thousands sources, it probably
> > won't. I would still go for the heap solution since IMO the resulting
> > code it's more readable and easier to understand.
>
> I'm not so sure about either sentence...:
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=S
> Best time for 10 loops: 0.247116088867
>
> Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
> --how=H
> Best time for 10 loops: 0.10344004631
>
> i.e., a heap solution may be over 4 times faster than a sort-based one
> (in the following implementations).
Interesting; I thought timsort on small almost ordered lists would be practically as fast as the
heap. Still, how is 0.10344 over 4 times faster than 0.247116 ?
> Readability seems quite comparable
> (skipping the rest of the infrastructure, which generates random sorted
> streams and ensures a stream is exhausted and verifies it etc etc):
>
> def merge_by_sort(streams):
> sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
> while sources:
> sources.sort(reverse=True)
> best_source = sources[-1]
> yield best_source[0]
> try: best_source[0] = best_source[-1]()
> except StopIteration: sources.pop()
>
> def merge_by_heap(streams):
> sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
> heapq.heapify(sources)
> while sources:
> best_source = sources[0]
> yield best_source[0]
> try: best_source[0] = best_source[-1]()
> except StopIteration: heapq.heappop(sources)
> else: heapq.heapreplace(sources, best_source)
Indeed, these are almost equivalent as far as readability goes; the previous examples in the thread
were less clear. By the way, why do you need 'i' and enumerate above ?
> Hmmm, I wonder if something like merge_by_heap would be a good candidate
> for itertool. Raymond...?
>
>
> Alex
Yes, it would be nice to make it into 2.5.
George
More information about the Python-list
mailing list