Efficient String Lookup?

Diez B. Roggisch deetsNOSPAM at web.de
Sat Oct 16 08:22:24 EDT 2004


Chris S. wrote:

> I have a number of strings, containing wildcards (e.g. 'abc#e#' where #
> is anything), which I want to match with a test string (e.g 'abcdef').
> What would be the best way for me to store my strings so lookup is as
> fast as possible?

As a compiled regular expression, I guess - you don't give much info here,
so maybe there is a better way. But to me it looks like a classic regexp
thing. Maybe if your wildcards are equivalent to .*, then using subsequent
string searches lik this helps you:

pattern = 'abc#e#'.split('#')
s = 'abcdef'
found = True
pos = 0
for p in pattern:
    h = s.find(p)
    if h != -1:
        p = h + 1b
    else:
        found = False


That might be faster, if the string.find operation uses something else than
simple brute force linear searching - but I don't know enough about the
internals of python's string implementation to give an definite answer
here.

But to be honest: I don't think regexps are easy to beat, unless your
usecase is modeled in a way that makes it prone to other approaches.

-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list