Merging sorted lists/iterators/generators into one stream of values...

Alex Martelli aleax at mail.comcast.net
Mon Oct 10 10:34:50 EDT 2005


George Sakkis <gsakkis at rutgers.edu> wrote:
   ...
> > i.e., a heap solution may be over 4 times faster than a sort-based one
> > (in the following implementations).
> 
> Interesting; I thought timsort on small almost ordered lists would be
> practically as fast as the heap. Still, how is 0.10344 over 4 times faster
> than 0.247116 ?

Oops, 2.5 times (for 10 streams) -- the ratio keeps getting bigger with
the number of streams, the figure 4 I had in mind was probably from
comparisons with 50 streams or so.

timsort needs N comparisons even if the list is already ordered (just to
check it is) while heaps can do with log(N) comparisons or even less in
the best case -- that's the key issue here.

> >   sources = [[s.next(), i, s.next] for i, s in enumerate(streams)]
   ...
> Indeed, these are almost equivalent as far as readability goes; the
> previous examples in the thread were less clear. By the way, why do you
> need 'i' and enumerate above ?

Making sure that, if the first items are identical, the sorting (or
comparison for heap) does not compare the _methods_ -- no big deal if it
does, but that doesn't logically make sense.  If we had key= arguments,
that would ENSURE no such comparison, but heaps didn't support that in
2.4 and I wanted to keep the playing field level, so I used the above
old trick to ensure stable sorting without comparisons beyond a given
field (more details were explained in the 1st edition of the Cookbook,
but I hope I give the general idea).


Alex



More information about the Python-list mailing list