Which is faster? (if not b in m) or (if m.count(b) > 0)

Wed Feb 15 12:25:01 EST 2006

Steven D'Aprano wrote:
> On Wed, 15 Feb 2006 08:44:10 +0100, Marc 'BlackJack' Rintsch wrote:

>>``if not b in m`` looks at each element of `m` until it finds `b` in it
>>and stops then.  Assuming `b` is in `m`, otherwise all elements of `m` are
>>"touched".
>>
>>``if m.count(b) > 0`` will always goes through all elements of `m` in the
>>`count()` method.
> 
> 
> But the first technique executes in (relatively slow) pure Python, while
> the count method executes (relatively fast) C code. So even though count
> may do more work, it may do it faster.

'a in b' actually takes fewer bytecodes because 'in' has it's own 
bytecode but b.count requires an attribute lookup:

  >>> import dis
  >>> def foo():
  ...   a in b
  ...
  >>> def bar():
  ...   b.count(a)
  ...
   >>> dis.dis(foo)
   2           0 LOAD_GLOBAL              0 (a)
               3 LOAD_GLOBAL              1 (b)
               6 COMPARE_OP               6 (in)
               9 POP_TOP
              10 LOAD_CONST               0 (None)
              13 RETURN_VALUE
  >>> dis.dis(bar)
   2           0 LOAD_GLOBAL              0 (b)
               3 LOAD_ATTR                1 (count)
               6 LOAD_GLOBAL              2 (a)
               9 CALL_FUNCTION            1
              12 POP_TOP
              13 LOAD_CONST               0 (None)
              16 RETURN_VALUE

Not much difference in speed when the item is not in the list, a slight 
edge to 'a in b' for a large list:

D:\>python -m timeit -s "lst = range(1000)" "1001 in lst"
10000 loops, best of 3: 55.5 usec per loop

D:\>python -m timeit -s "lst = range(1000)" "lst.count(1001) > 1"
10000 loops, best of 3: 55.5 usec per loop

D:\>python -m timeit -s "lst = range(1000000)" "lst.count(-1) > 1"
10 loops, best of 3: 62.2 msec per loop

D:\>python -m timeit -s "lst = range(1000000)" "(-1) in lst"
10 loops, best of 3: 60.8 msec per loop

Of course if the item appears in the list, 'a in b' has a huge advantage:

D:\>python -m timeit -s "lst = range(1000000)" "(1) in lst"
10000000 loops, best of 3: 0.171 usec per loop

Kent