What is the most efficient way to find similarities and differences between the contents of two lists?

Zachary Dziura zcdziura at gmail.com
Mon Jun 13 11:06:14 EDT 2011


Hi all.

I'm writing a Python script that will be used to compare two database
tables. Currently, those two tables are dumped into .csv files,
whereby my code goes through both files and makes comparisons. Thus
far, I only have functionality coded to make comparisons on the
headers to check for similarities and differences. Here is the code
for that functionality:

similar_headers = 0
different_headers = 0
source_headers = sorted(source_mapping.headers)
target_headers = sorted(target_mapping.headers)

# Check if the headers between the two mappings are the same
if set(source_headers) == set(target_headers):
    similar_headers = len(source_headers)
else:
    # We're going to do two run-throughs of the tables, to find the
    # different and similar header names. Start with the source
    # headers...
    for source_header in source_headers:
        if source_header in target_headers:
            similar_headers += 1
        else:
            different_headers += 1
    # Now check target headers for any differences
    for target_header in target_headers:
        if target_header in source_headers:
            pass
        else:
            different_headers += 1

As you can probably tell, I make two iterations: one for the
'source_headers' list, and another for the 'target_headers' list.
During the first iteration, if a specific header (mapped to a variable
'source_header') exists in both lists, then the 'similar_headers'
variable is incremented by one. Similarly, if it doesn't exist in both
lists, 'different_headers' is incremented by one. For the second
iteration, it only checks for different headers.

My code works as expected and there are no bugs, however I get the
feeling that I'm not doing this comparison in the most efficient way
possible. Is there another way that I can make this same comparison
while making my code more Pythonic and efficient? I would prefer not
to have to install an external module from elsewhere, though if I have
to then I will.

Thanks in advance for any and all answers!



More information about the Python-list mailing list