heapq.merge with key=

Peter Otten __peter__ at web.de
Sat May 9 04:09:24 EDT 2009


Kevin D.  Smith wrote:

> On 2009-05-07 23:48:43 -0500, Chris Rebert <clp2 at rebertia.com> said:
> 
>> On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <Kevin.Smith at sas.com>
>> wrote:
>>> I need the behavior of heapq.merge to merge a bunch of results from a
>>> database.  I was doing this with sorted(itertools.chain(...), key=
>> ...), but
>>> I would prefer to do this with generators.  My issue is that I need
>> the key
>>> argument to sort on the correct field in the database.  heapq.merge
>> doesn't
>>> have this argument and I don't understand the code enough to know if
>>> it's possible to add it.  Is this enhancement possible without
>>> drasticall
>> y
>>> changing the current code?
>> 
>> I think so. Completely untested code:
>> 
>> def key(ob):
>>     #code here
>> 
>> class Keyed(object):
>>     def __init__(self, obj):
>>         self.obj = obj
>>     def __cmp__(self, other):
>>         return cmp(key(self.obj), key(other.obj))
>> 
>> def keyify(gen):
>>     for item in gen:
>>         yield Keyed(item)
>> 
>> def stripify(gen):
>>     for keyed in gen:
>>         yield keyed.obj
>> 
>> merged = stripify(merge(keyify(A), keyify(B), keyify(C))) #A,B,C being
>> the iterables
> 
> Ah, that's not a bad idea.  I think it could be simplified by letting
> Python do the comparison work as follows (also untested).
> 
> def keyify(gen, key=lamda x:x):
>     for item in gen:
>         yield (key(item), item)
> 
> def stripify(gen):
>     for item in gen:
>         yield item[1]

This is not as general. It makes sorting unstable and can even fail if the 
items are unorderable:

>>> def decorate(gen, key):
...     for item in gen:
...             yield key(item), item
...
>>> def undecorate(gen):
...     for item in gen:
...             yield item[-1]
...
>>> sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

One way to fix it:

>>> def decorate(gen, key):
...     for index, item in enumerate(gen):
...             yield key(item), index, item
...
>>> sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
[(0.0, 0, 1j), (0.0, 1, -1j), (1.0, 2, (1+1j))]
>>> list(undecorate(_))
[1j, -1j, (1+1j)]

Peter




More information about the Python-list mailing list