heapq.merge with key=

Scott David Daniels Scott.Daniels at Acm.Org
Sat May 9 12:51:24 EDT 2009


Peter Otten wrote:
> Kevin D.  Smith wrote:
> 
>> On 2009-05-07 23:48:43 -0500, Chris Rebert <clp2 at rebertia.com> said:
>>
>>> On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <Kevin.Smith at sas.com>
>>> wrote:
>>>> I need the behavior of heapq.merge to merge a bunch of results from a
>>>> database.... I would prefer to do this with generators....  I need
>>>> the key argument to sort on the correct field in the database...
>>> I think so.... [some code]
>> Ah, that's not a bad idea.  I think it could be simplified by letting
>> Python do the comparison work as follows (also untested).
>>
>> def keyify(gen, key=lamda x:x):
>>     for item in gen:
>>         yield (key(item), item)
> ... This is not as general. It makes sorting unstable and can even fail 
> if the items are unorderable:
> One way to fix it:
>>>> def decorate(gen, key):
> ...     for index, item in enumerate(gen):
> ...             yield key(item), index, item
>>>> sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
> [(0.0, 0, 1j), (0.0, 1, -1j), (1.0, 2, (1+1j))]
>>>> list(undecorate(_))
> [1j, -1j, (1+1j)]

However, a heap gets faster if equal items are equal, so you
really want the decorated value to be the sole component of a
comparison.  (and remember, sorts are now stable).
As in your code, key generation is likely best calculated once.
Here's another fix:

     class Keyed(object):
         def __init__(self, obj, key):
             self.obj = obj
             self.key = key

         def __lt__(self, other): # cmp is old-style magic
             return self.key < other.key

         def __eq__(self, other):
             return self.key == other.key

         def __repr__(self):
             return '%s(%r, %r)' % (
		self.__class__.__name__, self.obj, self.key)

     def kmerge(key, *streams):
         keyed_streams = [(Keyed(obj, key(obj)) for obj in stream)
                          for stream in streams]
         for element in heapq.merge(*keyed_streams):
             yield element.obj

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list