Performance of list vs. set equality operations

Raymond Hettinger python at rcn.com
Sat Apr 10 01:46:35 EDT 2010


> > I don't know if it's a good "fix" anyway.  If you subclass an internal
> > type, you can certainly supply your own rich comparison methods, which
> > would (IMO) put the CPU computation burden where it belongs if you
> > decide to do something goofy like subclass a list and then override
> > __len__.
>
> We're all consenting adults, that's the Python philosophy, isn't it?
> If I decide to make stupid things, it's my fault. I don't see why Python  
> should have to prevent that.

Perhaps so for pure python classes, but the C builtins are another
story.

The C containers directly reference underlying structure and methods
for several reasons.  The foremost reason is that if their internal
invariants are violated, they can segfault.  A list's __getitem__
method needs to know the real length (not what you report in __len__)
if it is to avoid writing objects outside of its allocated memory
range.  Another reason is efficiency -- the cost of attribute lookups
is high and would spoil the performance of the builtins if they could
not access their underlying structure and friend methods directly.
It is important to have those perform well because they are used
heavily
in everyday programming.

There are also couple of OOP design considerations.  The
http://en.wikipedia.org/wiki/Open/closed_principle is one example.

Encapsulation is another example.  If you override __len__
in order to influence the behavior of __eq__, then you're
relying on an implementation detail, not the published interface.
Eventhough the length check is an obvious optimization
for list equality and set equality, there is no guarantee
that other implementations of Python use that same pattern.

my-two-cents-ly yours,

Raymond





More information about the Python-list mailing list