Merging sorted lists/iterators/generators into one stream of values...

Mike C. Fletcher mcfletch at rogers.com
Thu Oct 6 20:05:01 EDT 2005


Lasse Vågsæther Karlsen wrote:

>I need to merge several sources of values into one stream of values. All 
>of the sources are sorted already and I need to retrieve the values from 
>them all in sorted order.
>
>In other words:
>s1 = [10, 20, 30, 40, 50]
>s2 = [15, 25]
>s3 = [17, 27, 37]
>
>for value in ???(s1, s2, s3):
>     print value
>
>will print out 10, 15, 17, 20, 25, 27, 30, 37, 40, 50 in that order.
>
>The sources are cursors retrieving data from several databases, not from 
>the same server, and there's a potential for a large number of rows from 
>several of the sources. As such, any method that would load it all into 
>memory and sort it is no good as it would too much memory.
>
>Is there a function or whatnot in Python that will do what I want? I 
>have come up with my own method but since it uses generators I'm not 
>sure how much overhead there is. Additionally, since the values 
>retrieved from the cursors will be dictionaries of "fieldname":value 
>pairs, the method would either need to handle that construct (and be 
>told what fieldname to sort on), or be able to take a function object to 
>use for the comparison operation.
>
>Basically, I'll use my own function for this unless someone can point me 
>to something that is already available. Couldn't seem to find anything 
>in the builtin modules but I didn't find glob.glob when I was looking 
>for how to find filenames either so who knows what it's called :)
>
>Since I need to deal with a variable list of sources (that is, 2 in one 
>application, 3 in another and 4-5 in a third), a simple 2-source method 
>isn't enough but if it's better than what I do for 2 sources then I can 
>make a wrapper for it, since that's what I do anyway.
>  
>
I doubt you'll find a prebuilt one, it's fairly easy to create your own, 
after all.  Generators are fairly fast constructs in Python, btw.  
Here's what I whipped up in a few minutes:

def firstIter( value ):
    it = iter( value )
    try:
        return it.next(), it
    except StopIteration, err:
        return None, None

def inorder( comparision, *args ):
    iterators = [
        [value,it]
        for (value,it) in [ firstIter( arg ) for arg in args ]
        if it is not None
    ]
    iterators.sort()
    while iterators:
        yield iterators[0][0]
        try:
            value = iterators[0][0] = iterators[0][1].next()
        except StopIteration, err:
            iterators.pop(0)
        else:
            if len(iterators) > 1 and comparision( value, 
iterators[1][0]) == 1:
                iterators.sort()
                continue

if __name__ == "__main__":
    s1 = [10, 20, 30, 40, 50]
    s2 = [15, 25]
    s3 = [17, 27, 37]
    s4 = []
    for value in inorder(cmp, s1, s2, s3, s4):
        print value


Anyway, have fun,
Mike

-- 
________________________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://www.vrplumber.com
  http://blog.vrplumber.com




More information about the Python-list mailing list