Substring Detection? Pythonically?

Alex Martelli aleaxit at yahoo.com
Fri Oct 6 05:32:48 EDT 2000


"Stephen Hansen" <stephen at cerebralmaelstrom.com> wrote in message
news:20001005.191011.17414 at Jeremy.cerebralmaelstrom.com...
> LOL. I feel dumb. I don't really know precisely what is going on in that
> acode, largely because I don't yet completely understand lamdba's and try
> to avoid regexp's whenever possible, so don't exactly clearly understand
> them either. :) I'm very impressed it's so easily -- and perfectly! --
doable
> in eight lines. :)

I'm a big fan of the Extreme Programming principle #1: *DO THE SIMPLEST
THING THAT COULD POSSIBLY WORK*.  lambda, regexp, etc, are all fine and
dandy, but complexity has a nasty habit of coming back and biting you.

So, I would propose an alternative to <F>'s excellent (but complex)
regexp-based solution -- one based on utter simplicity.

Basically: you have a list of strings, and want the sub-list made up of
those list elements that start with a certain user-supplied string.
OK, then...:

class SimpleMatcher:
    def __init__(self, names):
        self.names = names
    def find(self, start):
        return [name for name in self.names if name.startswith(start)]

i.e., "return the list of those elements of self.names that start
with the user-supplied string" -- basically a restatement of the
problem-statement; it doesn't get much simpler than this.

Before 2.0 (list comprehension, startswith method of the string object),
the body of SimpleMatcher's find would be quite a bit longer, though the
complexity would be roughly equivalent:

    result = []
    startlen = len(start)
    for name in self.names:
        if name[:startlen] == start:
            result.append(name)
    return result


Performance isn't that bad:

>>> names = ['x%3.3d' % i for i in range(1000)]
>>> m=prov.Matcher(names)
>>> s=prov.SimpleMatcher(names)
>>> profile.run('y=m.find("x")')
         81 function calls (75 primitive calls) in 0.050 CPU seconds
    [snip]
>>> profile.run('y=s.find("x")')
         3 function calls in 0.013 CPU seconds

The profiler may be invasive, and distort relative performance in
favour of the simple-minded solution (I suspect there is some
miscalibration in Python 2.0b2's profiler, though I can't quite
put my finger on it), but at least this suggests that the simple
matcher isn't ruinously slower than the sophisticated one, and
the runtimes involved are probably low enough that it does not
really matter (depending on your patterns of usage, of course).


Alex






More information about the Python-list mailing list