heapq.merge with key=

Fri May 8 10:49:23 EDT 2009

On 2009-05-07 23:48:43 -0500, Chris Rebert <clp2 at rebertia.com> said:

> On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <Kevin.Smith at sas.com> wrote:
>> I need the behavior of heapq.merge to merge a bunch of results from a
>> database.  I was doing this with sorted(itertools.chain(...), key=
> ...), but
>> I would prefer to do this with generators.  My issue is that I need
> the key
>> argument to sort on the correct field in the database.  heapq.merge
> doesn't
>> have this argument and I don't understand the code enough to know if it's
>> possible to add it.  Is this enhancement possible without drasticall
> y
>> changing the current code?
> 
> I think so. Completely untested code:
> 
> def key(ob):
>     #code here
> 
> class Keyed(object):
>     def __init__(self, obj):
>         self.obj = obj
>     def __cmp__(self, other):
>         return cmp(key(self.obj), key(other.obj))
> 
> def keyify(gen):
>     for item in gen:
>         yield Keyed(item)
> 
> def stripify(gen):
>     for keyed in gen:
>         yield keyed.obj
> 
> merged = stripify(merge(keyify(A), keyify(B), keyify(C))) #A,B,C being
> the iterables

Ah, that's not a bad idea.  I think it could be simplified by letting 
Python do the comparison work as follows (also untested).

def keyify(gen, key=lamda x:x):
    for item in gen:
        yield (key(item), item)

def stripify(gen):
    for item in gen:
        yield item[1]

After looking at the heapq.merge code, it seems like something like 
this could easily be added to that code.  If the next() calls were 
wrapped with the tuple creating code above and the yield simply 
returned the item.  It would, of course, have to assume that the 
iterables were sorted using the same key, but that's better than not 
having the key option at all.

-- 
Kevin D. Smith