Merging sorted lists/iterators/generators into one stream of values...

Lasse Vågsæther Karlsen lasse at vkarlsen.no
Thu Oct 6 13:42:27 EDT 2005


I need to merge several sources of values into one stream of values. All 
of the sources are sorted already and I need to retrieve the values from 
them all in sorted order.

In other words:
s1 = [10, 20, 30, 40, 50]
s2 = [15, 25]
s3 = [17, 27, 37]

for value in ???(s1, s2, s3):
     print value

will print out 10, 15, 17, 20, 25, 27, 30, 37, 40, 50 in that order.

The sources are cursors retrieving data from several databases, not from 
the same server, and there's a potential for a large number of rows from 
several of the sources. As such, any method that would load it all into 
memory and sort it is no good as it would too much memory.

Is there a function or whatnot in Python that will do what I want? I 
have come up with my own method but since it uses generators I'm not 
sure how much overhead there is. Additionally, since the values 
retrieved from the cursors will be dictionaries of "fieldname":value 
pairs, the method would either need to handle that construct (and be 
told what fieldname to sort on), or be able to take a function object to 
use for the comparison operation.

Basically, I'll use my own function for this unless someone can point me 
to something that is already available. Couldn't seem to find anything 
in the builtin modules but I didn't find glob.glob when I was looking 
for how to find filenames either so who knows what it's called :)

Since I need to deal with a variable list of sources (that is, 2 in one 
application, 3 in another and 4-5 in a third), a simple 2-source method 
isn't enough but if it's better than what I do for 2 sources then I can 
make a wrapper for it, since that's what I do anyway.

-- 
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:lasse at vkarlsen.no
PGP KeyID: 0x2A42A1C2



More information about the Python-list mailing list