Match beginning of two strings

Alex Martelli aleax at aleax.it
Mon Aug 4 05:30:50 EDT 2003


Jeff Epler wrote:
   ...
> .. unfortunately, it seems to be slower than the first method.  On my
> machine (800MHz PIII):
> $ python timeit.py -s 'import ravi' \
>     'ravi.extract("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
> 10000 loops, best of 3: 32.7 usec per loop

Here's my proposal...:

import sys

def extract(a, b):
    m = min(len(a), len(b))
    for i in range(m):
        if a[i] != b[i]:
            return a[:i]
    return a[:m]

def extract2(a, b):
    for i, ai, bi in zip(xrange(sys.maxint), a, b):
        if ai != bi: return a[:i]
    return a[:m]

def extract3(a, b):
    for i, ai in enumerate(a):
        if ai != b[i:i+1]:
            return a[:i]
    return a

[alex at lancelot python2.3]$ python -O timeit.py -s 'import exa'
'exa.extract("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
100000 loops, best of 3: 13.9 usec per loop
[alex at lancelot python2.3]$ python -O timeit.py -s 'import exa'
'exa.extract2("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
10000 loops, best of 3: 19.7 usec per loop
[alex at lancelot python2.3]$ python -O timeit.py -s 'import exa'
'exa.extract3("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
100000 loops, best of 3: 15.7 usec per loop

Now add after the "import sys" two lines:

import psyco
psyco.full()

and run again:

[alex at lancelot python2.3]$ python -O timeit.py -s 'import exa'
'exa.extract("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
1000000 loops, best of 3: 0.771 usec per loop

[alex at lancelot python2.3]$ python -O timeit.py -s 'import exa'
'exa.extract3("abcdefghijklmnopqrstuvwxyz","abcdefghijklmnopBHLHT")'
100000 loops, best of 3: 3.37 usec per loop

However, extract2 doesn't run correctly with psyco (gets a MemoryError).

Still, the 18-times acceleration that psyco is able to effect on
the naive extract IS typical of psyco's effect on functions coded
in simple, elementary terms.  When you really need speed, assuming
that your processor is Intel-compatible, consider psyco (of course,
you'll generally use psyco.profile, or something more selective
still, rather than psyco.full) -- orders-of-magnitude improvements
on low-level bottleneck functions are anything but surprising...


Alex





More information about the Python-list mailing list