[Python-ideas] Changing `Sequence.__contains__`

Mark Lawrence breamoreboy at yahoo.co.uk
Mon Jul 21 19:26:47 CEST 2014


On 21/07/2014 03:59, Andrew Barnert wrote:
> On Sunday, July 20, 2014 7:40 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>
>>> On 21/07/2014 03:18, Andrew Barnert wrote:
>>>   On Sunday, July 20, 2014 7:06 PM, Mark Lawrence
>> <breamoreboy at yahoo.co.uk> wrote:
>>>
>>>>>   On 20/07/2014 23:06, Ram Rachum wrote:
>>>>>     Why does the default `Sequence.__contains__` iterate through the
>> items
>>>>>     rather than use `.index`, which may sometimes be more efficient?
>>>>>
>>>>>     I suggest an implementation like this:
>>>>>
>>>>>          def __contains__(self, i):
>>>>>              try: self.index(i)
>>>>>              except ValueError: return False
>>>>>              else: return True
>>>>>     What do you think?
>>>>>
>>>>
>>>>   I don't see how that can be more efficient than the naive
>>>>
>>>>   def __contains__(self, i):
>>>>         for elem in self:
>>>>             if elem == i:
>>>>                 return True
>>>>         return False
>>>>
>>>>   What am I missing?
>>>
>>>
>>>   Consider a blist.sortedlist
>> (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data
>> structure built on a tree, skip list, etc.
>>>
>>>   The index method is O(log N), so Ram's __contains__ is also O(log N).
>> But naively iterating is obviously O(N). (In fact, it could be worse—if you
>> don't implement a custom __iter__, and your indexing is O(log N), then the
>> naive __contains__ will be O(N log N)…)
>>>
>>>   Needless to say, blist.sortedlist implements a custom O(log N)
>> __contains__, and so does (hopefully) every other such library on PyPI. But
>> Ram's proposal would mean they no longer have to do so; they'll get
>> O(log N) __contains__ for free just by implementing index.
>>>
>>>   Of course that only removes one method. For example, they still have to
>> implement a custom count method or they'll get O(N) performance from the
>> default version. If you look at the code for any of these types, __contains__ is
>> a tiny percentage of the implementation. So, it's not a huge win. But
>> it's a small one.
>>>
>>
>> What has blist.sortedlist, which IIRC is one of the data structures that
>> has been rejected as forming part of the standard library, got to do
>> with the default sequence.__contains__ ?
>
> I think you're missing the whole point here.
>
> Sequence is an ABC—an Abstract Base Class—that's used (either by inheritance, or registration) by a wide variety of sequence classes—built-in, stdlib, or third-party.
>
>
> Like most of the other ABCs in the Python stdlib, it's also usable as a mixin, providing default implementations for methods that you don't want to provide in terms of those that you do. Among the mixin methods it provides is __contains__, as documented at https://docs.python.org/dev/library/collections.abc.html#collections-abstract-base-classes and implemented at http://hg.python.org/cpython/file/default/Lib/_collections_abc.py#l629
>
> I suspect the problem is that you parsed "the default Sequence.__contains__" wrong; Ram was referring to "the default implementation of __contains__ provided as Sequence.__contains__", but you thought he was referring to "the implementation of __contains__ in the default sequence", and whatever "the default sequence means" it obviously can't be a class from a third-party module, right?
>

Thanks for the explanation and yes I did parse it incorrectly. 
Strangely everything seems much clearer at 6PM rather than 3AM :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com




More information about the Python-ideas mailing list