Optimising list comparison

Graham Ashton graz at mindless.com
Sat Mar 16 06:00:13 EST 2002


On Fri, 15 Mar 2002 22:33:29 +0000, Tim Peters wrote:

> [Graham Ashton]
>>     def compare():
>>         for i in range(len(list1)):
>>             a = list1[i]
>>             b = list2[i]
>>             if (a != b) and (a != -1) and (b != -1):
>>                 return 0
>>         return 1
> 
> You could also exploit Python's chained comparisons:
> 
>     if -1 != a != b != -1:
>         return 0

That's quite neat. I'll bear it in mind. I do see significant
differences in performance for different data sets, as you pointed out.

>     if list1[i] + list2[i] == 1:
>         return 0
> 
> should work fine, and runs at the same speed regardless of relative
> frequency of the various input-pair possibilities.

Very cunning. And yes, it does go nearly as fast as the original
comparison, even under the original loop's most favourable conditions.
It beats everything when there is more varied input. :)

> Then it's actually quite difficult to explain why
> 
> import operator
> def attempt5(list1, list2, sum=operator.__add__):
>     return 1 not in map(sum, list1, list2)
> 
> isn't quicker than an explicit loop doing the same thing.

And it's not. I've tested it, and though it comes quite close I must
admit I'm surprised. Has it got something to do with the "1 not in seq"
part having to traverse the whole sequence?

> Keep trying <wink>.  Also give
> 
>     for a, b in zip(list1, list2):
> 
> a shot.  It may surprise you <wink>.

It's not bad, but the fastest is still attempt4 (see below).

>> [breaking out of map()]
>
> You can't, short of raising an exception.  Whether you *want* to break
> out early depends on the expected distribution of input pairs; e.g., if,
> like the example you're timing, it's typical that the vectors start with
> a long stretch of equal elements, breaking early would be much more
> useful if you started comparing at the end.

I'm not sure about that - that's my friend's research part that I'm not
familiar with, but I'm assuming that the frequency of a mismatch will be
low, but they could occur at any point in the string.

> Well, your specific test case always raises an exception.  Exceptions
> are cheaper if they're not raised.  In the intended application, what
> percentage of the time do you expect inequality to occur?  An exception
> gets more attractive the less often inequality obtains.

I don't know that either. We'll have to ask him. Matt - are you reading 
this?

> Note that having to use a lambda is also a drag on speed.  map() speed
> tricks are much more effective if a builtin function can be mapped.

That's very useful to know.

> However (and this is a clue to an earlier mystery), an infix "+" applied
> to integers is much quicker than operator.__add__ applied to integers.

Ah! I see. Thanks for all that Tim - a very enlightening conversation.

>> Here's my complete script:
>> ...
> Thanks!  That's very helpful.

Well for those who've been following this, here's my latest version.
attempt5() is commented out because you need 2.2 to get it to run.

----cut----
#!/usr/bin/env python

import time

list1 = [1] * 20
## list1 = [1] * 9 + [0] + [1] * 10
list2 = [1] * 19 + [0]

def benchmark(func, iters=100000, l1=list1, l2=list2):
    start = time.time()
    for i in range(iters):
        r = func(l1, l2)
    end = time.time()
    print "%d iterations of %s took: %.6f secs" % (iters, func, end - start)

def attempt1(l1, l2):
    for i in range(len(l1)):
        if (l1[i] != l2[i]) and (l1[i] != -1) and (l2[i] != -1):
            return 0
    return 1

func = lambda a, b: (a == b) or (a == -1) or (b == -1) or 1 / 0

def attempt2(l1, l2):
    try:
        map(func, l1, l2)
    except ZeroDivisionError:
        pass

def attempt3(l1, l2):
    for i in range(len(l1)):
        if -1 != l1[i] != l2[i] != -1:
            return 0
    return 1

def attempt4(l1, l2):
    for i in range(len(l1)):
        if l1[i] + l2[i] == 1:
            return 0
    return 1

## import operator
## def attempt5(l1, l2, sum=operator.__add__):
##     return 1 not in map(sum, l1, l2)

def attempt6(l1, l2):
    for a, b in zip(l1, l2):
        if a + b == 1:
            return 0
    return 1

if __name__ == "__main__":
##     for f in (attempt1, attempt2, attempt3, attempt4, attempt5, attempt6):
    for f in (attempt1, attempt2, attempt3, attempt4, attempt6):
        benchmark(f)
----cut----

ratchet% python listmatch.py
100000 iterations of <function attempt1 at 0x8105e9c> took: 3.349525 secs
100000 iterations of <function attempt2 at 0x810340c> took: 7.734273 secs
100000 iterations of <function attempt3 at 0x811c254> took: 4.878889 secs
100000 iterations of <function attempt4 at 0x812b1f4> took: 3.413134 secs
100000 iterations of <function attempt6 at 0x811c28c> took: 4.656984 secs

-- 
Graham



More information about the Python-list mailing list