What is the most efficient way to compare similar contents in two lists?

Mon Jun 13 13:58:12 EDT 2011

On Mon, Jun 13, 2011 at 8:09 AM, Chris Angelico <rosuav at gmail.com> wrote:

> On Tue, Jun 14, 2011 at 12:58 AM, Zachary Dziura <zcdziura at gmail.com>
> wrote:
> > if set(source_headers) == set(target_headers):
> >    similar_headers = len(source_headers)
>
> Since you're making sets already, I'd recommend using set operations -
> same_headers is the (length of the) intersection of those two sets,
> and different_headers is the XOR.
>
> # If you need the lists afterwards, use different variable names
> source_headers = set(source_headers)
> target_headers = set(target_headers)
> similar_headers = len(source_headers & target_headers)
> different_headers = len(source_headers ^ target_headers)
>

This is a beautiful solution, and yet I feel compelled to mention that it
disregards duplicates within a given list.  If you need duplicate
detection/differencing, it's better to sort each list and then use an
algorithm similar to the merge step of mergesort.

Using sets as above is O(n), while the sorting version is O(nlogn) usually.
O(n) is better than O(nlogn).

And of course, the version based on sorting assumes order doesn't matter.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110613/51c3aab5/attachment-0001.html>