Efficient String Lookup?
Nicolas Lehuen
nicolas at lehuen.com
Sat Oct 16 12:53:34 EDT 2004
Josiah Carlson wrote:
>>I have a number of strings, containing wildcards (e.g. 'abc#e#' where #
>>is anything), which I want to match with a test string (e.g 'abcdef').
>>What would be the best way for me to store my strings so lookup is as
>>fast as possible?
>
>
> Start with a Trie, and virtually merge branches as necessary.
>
> - Josiah
>
Yup, you might also have a look at the Aho-Corasick algorithm, which can
match a test string against a big number of strings quite efficiently :
http://www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC.html
You'll have to adapt the algorithm so that it can handle your wilcard,
though. I found it easy to implement the '.' (one character wildcard),
but the '.*' (zero or more character wildcard) forces you to have
backtracking.
But if you can use the regexp engine, do not hesitate, it will save you
a lot of headaches. Unless of course you're a student and your teacher
asked you this question ;).
Regards,
Nicolas
More information about the Python-list
mailing list