[Python-ideas] Changing `Sequence.__contains__`

Andrew Barnert abarnert at yahoo.com
Mon Jul 21 04:18:44 CEST 2014


On Sunday, July 20, 2014 7:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:

> > On 20/07/2014 23:06, Ram Rachum wrote:
>>  Why does the default `Sequence.__contains__` iterate through the items
>>  rather than use `.index`, which may sometimes be more efficient?
>> 
>>  I suggest an implementation like this:
>> 
>>       def __contains__(self, i):
>>           try: self.index(i)
>>           except ValueError: return False
>>           else: return True
>>  What do you think?
>> 
> 
> I don't see how that can be more efficient than the naive
> 
> def __contains__(self, i):
>      for elem in self:
>          if elem == i:
>              return True
>      return False
> 
> What am I missing?


Consider a blist.sortedlist (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data structure built on a tree, skip list, etc.

The index method is O(log N), so Ram's __contains__ is also O(log N). But naively iterating is obviously O(N). (In fact, it could be worse—if you don't implement a custom __iter__, and your indexing is O(log N), then the naive __contains__ will be O(N log N)…)

Needless to say, blist.sortedlist implements a custom O(log N) __contains__, and so does (hopefully) every other such library on PyPI. But Ram's proposal would mean they no longer have to do so; they'll get O(log N) __contains__ for free just by implementing index.

Of course that only removes one method. For example, they still have to implement a custom count method or they'll get O(N) performance from the default version. If you look at the code for any of these types, __contains__ is a tiny percentage of the implementation. So, it's not a huge win. But it's a small one.


More information about the Python-ideas mailing list