how to convert code that uses cmp to python3

Chris Angelico rosuav at gmail.com
Sat Apr 9 10:41:16 EDT 2016


On Sun, Apr 10, 2016 at 12:25 AM, Antoon Pardon
<antoon.pardon at rece.vub.ac.be> wrote:
> Let me give you an artifical example to show what can happen. The keys are all
> iterables of equal lengths with integers as elements.
>
> Then this could be the cmp function:
>
> def cmp(ob1, ob2):
>     itr1 = iter(ob1)
>     itr2 = iter(ob2)
>     for el1, el2 in zip(itr1, itr2)
>         delta = el1 - el2
>         if delta != 0:
>             return delta
>     return 0
>
> Now maybe I'm missing something but I don't see how implementing __lt__ and
> __eq__ in this case will be much different from largely reimplenting cmp.
> So if you can work with a cmp function, you only need to travese the two
> iterables only once. If you do it with the __lt__ and __eq__ methods you
> will have to traverse them twice.
>
> Now this probably is not a problem most of the times, but when you work with
> tree's this kind of comparison to make a three way decision happens often and
> the lower you descend in the tree, the close your argument will be with the
> keys of the nodes you visit, making it more and more probable, you have a
> large common prefix.
>
> So in these circumstances it is more likely to be problematic.

In this case, you're likely to end up with large branches of your tree
that have the same prefix. (And if you don't, your iterations are all
going to end early anyway, so the comparison is cheap.) A data
structure that takes this into account will out-perform the naive
comparison model every time. In fact, a simple dict will probably
out-perform your tree; by definition, your iterables have to be
stable, which means you could simply turn them into tuples and use
them as dict keys. Lookups and insertions would both require one pass
over the current key to calculate its hash, then a bucket lookup;
absent a pathological situation with hash collisions, this will result
in FAR better performance than any tree lookup ever will.

The advantage of the tree is that it requires *no* operations other
than "which of these is greater". But I'm hard-pressed to find an
object type that would plausibly be used in this way (it has to be
totally ordered, for a start - you can't use timestamp ranges, for
instance), yet can't take advantage of Python's massively-optimized
dictionary.

ChrisA



More information about the Python-list mailing list