Comparing lists

Steven D'Aprano steve at REMOVETHIScyber.com.au
Mon Oct 10 11:00:42 EDT 2005


On Mon, 10 Oct 2005 14:34:35 +0200, Christian Stapfer wrote:

> Sorting the two lists and then extracting
> A-B, B-A, A|B, A & B and A ^ B in one single
> pass seems to me very likely to be much faster
> for large lists.

Unless you are running a Python compiler in your head, chances are your
intuition has no connection to the actual performance of Python except by
pure coincidence.

Write some code, and time it in action.

In fact, here is some code I've written for you, complete with timer. Do
yourself a favour, and run it and see for yourself which is faster.

Then, if you think you can write a list version that is more efficient
than my (admittedly very inefficient) version, please do so. Then time the
difference again. Then think about how much time it took you to write,
test and debug your list version, compared to my four lines using sets.


from sets import Set
import time

def compare_and_separate(A, B):
    only_A = []
    only_B = []
    both_AB = []
    for item in A:
        # ignore items we've already seen before
        if item in only_A or item in both_AB:
            continue
        if item in B:
            both_AB.append(item)
        else:
            only_A.append(item)
    for item in B:
        # ignore items we've already seen before
        if item in only_B or item in both_AB:
            continue
        only_B.append(item)
    return (only_A, only_B, both_AB)

def compare_and_separate_with_sets(A, B):
    only_A = list(Set(A)-Set(B))
    only_B = list(Set(B)-Set(A))
    both_AB = list(Set(A+B) - Set(only_A+only_B))
    return (only_A, only_B, both_AB)

A = range(199) + range(44, 300, 3) + range(250, 500) + range(2000, 2100)
B = range(1000) + range(3000, 4000)

def tester():
    # confirm that the two functions give the same results
    r1 = compare_and_separate(range(5), range(3,8)) 
    r2 = compare_and_separate_with_sets(range(5), range(3,8))
    print r1
    print r2
    if r1 == r2:
        print "  Same."
    else:
        print "  Warning: results are different."
    loopit = range(20)  # repeat the test 20 times for timing purposes
    t1 = time.time()
    for i in loopit:
        results = compare_and_separate(A, B)
    t1 = time.time() - t1
    t2 = time.time()
    for i in loopit:
        results = compare_and_separate_with_sets(A, B)
    t2 = time.time() - t2
    print "Time with loops:", t1
    print "Time with sets:", t2



-- 
Steven.




More information about the Python-list mailing list