Merging sorted lists/iterators/generators into one stream of values...
Alex Martelli
aleax at mail.comcast.net
Mon Oct 10 02:05:47 EDT 2005
George Sakkis <gsakkis at rutgers.edu> wrote:
...
> > manipulation of a heap to place an item in the right spot, but with 4-5
> > or a few more sources might not make an impact at all.
>
> Unless you're talking about hundreds or thousands sources, it probably
> won't. I would still go for the heap solution since IMO the resulting
> code it's more readable and easier to understand.
I'm not so sure about either sentence...:
Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
--how=S
Best time for 10 loops: 0.247116088867
Helen:~/pynut/samp alex$ python merger.py --numstreams=10 --minlen=100
--how=H
Best time for 10 loops: 0.10344004631
i.e., a heap solution may be over 4 times faster than a sort-based one
(in the following implementations). Readability seems quite comparable
(skipping the rest of the infrastructure, which generates random sorted
streams and ensures a stream is exhausted and verifies it etc etc):
def merge_by_sort(streams):
sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
while sources:
sources.sort(reverse=True)
best_source = sources[-1]
yield best_source[0]
try: best_source[0] = best_source[-1]()
except StopIteration: sources.pop()
def merge_by_heap(streams):
sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
heapq.heapify(sources)
while sources:
best_source = sources[0]
yield best_source[0]
try: best_source[0] = best_source[-1]()
except StopIteration: heapq.heappop(sources)
else: heapq.heapreplace(sources, best_source)
Hmmm, I wonder if something like merge_by_heap would be a good candidate
for itertool. Raymond...?
Alex
More information about the Python-list
mailing list