Performance of list vs. set equality operations

Tue Apr 6 14:11:03 EDT 2010

Hello!

Could you please confirm whether my understanding of equality
operations in sets and lists is correct? This is how I think things
work, partially based on experimentation and the online documentation
for Python:

When you compare two lists, *every* element of one of the lists is
compared against the element at the same position in the other list;
that comparison is done by the __eq__() method (or the equivalent for
builtin types). This is interrupted when a result is False.

When you compare two sets, there's a loop over all the elements of the
first set, where the hash for that element is looked up in the second
set:
 - If this hash matches the hash for one or more elements in the
second set, the element in the first set is compared (with __eq__ or
equivalent) against the elements in the second set which have the same
hash. When a result is True, nothing else is done on that element and
the loop takes the next element in the first set; when all the results
are False, the loop ends and the two sets are not equivalent.
 - If the hash doesn't match that of an element in the second set,
then the loop ends and the two sets are not equivalent.

So this means that:
 1.- When you have two collections which have the same elements, the
equality operation will *always* be faster with lists.
 2.- When you have two collections with different elements, the
equality operation *may* be faster with sets.

For example, if you have two collections of 1,000 elements each and
998 of them are equivalent, comparing both collections as sets will be
slower than doing it with lists. But if you have two collections of
1,000 elements each and 998 of them are not equivalent, then comparing
both collections as lists will be slower than doing it with sets.

The performance of equality operations on sets is directly
proportional to the amount of different elements in both sets, while
the performance of equality operations on lists is simply proportional
to the cardinality of the collection.

In other words: The more different elements two collections have, the
faster it is to compare them as sets. And as a consequence, the more
equivalent elements two collections have, the faster it is to compare
them as lists.

Is this correct?

This is why so many people advocate the use of sets instead of lists/
tuples in similar situations, right?

Cheers,

 - Gustavo.