__lt__ slowing the "in" operator even if not called

Emanuele Aina emanuele.aina at gmail.com
Thu Jun 15 08:14:31 EDT 2006


Maric Michaud spiegò:

> Le Mercredi 14 Juin 2006 22:57, Emanuele Aina a écrit :
> > Here you can see that even with only the __lt__ method it goes 10x
> > slower, but __lt__ is never called as "Foo" is not printed.
>
> No, that is not what it shows. The only thing it shows is that in operator is
> slow for lists, and as it iterates over the entire list untill it find an
> element, it is much slower when it fails a lot.

Yes, I now that "in" is slow for lists and that a dict is preferable,
but what I'm pointing out is that in one case is much slower without a
reason.

In my test, the one without __lt__ runs in ~500ms, while the one with
the __lt__ method runs in ~5000ms.

Just for comparision I've also tested with an overridden __eq__ and it
shows timings similar to the ones in the second test, but this time it
is expected, as "in" has to call __eq__ for every element in the list.

In the first test python uses the native equality (comparing memory
pointer) and it is fast because it is implemented in C in the
interpreter (no python function call).

In the last case python has to call __eq__ for every element, and a
python function call is obviuosly more expensive than the native
comparision.

But the __lt__ case leaves me wondering why it is slow as the __eq__
case, while using the native comparision.

What I asked is what is the difference between the first and the second
case?

> Use dict in operator instead.

Yes, I know that a dict is the right datastructure for a fast "in", but
I need to keep track of the ordering, so I cannot use a dict.


> Here is a program to show this in details :
>
> import timing
>
> class State(object):
>     def __init__(self, value):
>         self.v = value
>
> class StateEQ(State):
>     def __eq__ (self, other):
>         if isinstance(other, State) :
>             return self.v == other.v
>         else :
>             return self.v == other
>
> class StateLT(State) :
>     def __lt__ (self, other):
>         if isinstance(other, State) :
>             return self.v < other.v
>         else :
>             return self.v < other
>
> class StateLTEQ(StateEQ, StateLT) : pass
>
> def test(state_cls):
>     num_elem = 3*10**4
>     print
>     print state_cls
>
>     l = [state_cls(0)]
>     for i in xrange(num_elem): l.append(state_cls(i+1))
>     print
>     print "in operator at the beginning of list:",
>     timing.start()
>     for i in xrange(300): i in l
>     timing.finish()
>     print timing.milli()
>     print "in operator at the end of list:",
>     timing.start()
>     for i in xrange(num_elem-300, num_elem): i in l
>     timing.finish()
>     print timing.milli()

Sorry, but this works only when you override __eq__.

Without it it will alway scan the entire list, so you are comparing the
timings of two different things.

>     print
>     print "converting to dict :",
>     timing.start()
>     d = {}.fromkeys(l)
>     timing.finish()
>     print timing.milli()
>     print "in operator for a dict for %s elements:" % (num_elem*2),
>     timing.start()
>     for i in xrange(num_elem, num_elem*2): i in d
>     timing.finish()
>     print timing.milli()

As I said, that was only a test demo exploiting my observation, in my
real program I need to keep track of the order in which I add items in
the list, so I can't use a dict.

> for which the output is :
>
> <class '__main__.State'>
>
> in operator at the beginning of list: 1983
> in operator at the end of list: 2011
>
> converting to dict : 82
> in operator for a dict for 60000 elements: 12
>
> <class '__main__.StateLT'>
>
> in operator at the beginning of list: 3866
> in operator at the end of list: 3871

Here python is using the equality comparator from "int", giving those
fast results.
I've substituted the ints with instances of state_cls:
-    for i in xrange(300): i in l
+    for i in xrange(300): state_cls(i) in l

-    for i in xrange(num_elem-300, num_elem): i in l
+    for i in xrange(num_elem-300, num_elem): state_cls(i) in l

Running these test, in the case of StateLT, now takes ~10000ms and
~9000ms.

[...]

> Every classes that define the __eq__ operator will find quickly the elements
> if they are at the beginning of the list.
> If they are at the end, the in oprerator is slower in these classes because of
> the overhead of function calls (in StateLt there is also an overhead due to
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> internal lookup, IMO).
  ^^^^^^^^^^^^^^^

Yes, that was my question! :)

But I hoped in a more exaustive answer: why python has to do this
lookup when the __lt__ method is not involved at all?




More information about the Python-list mailing list