Efficient String Lookup?

Nicolas Lehuen nicolas at lehuen.com
Sat Oct 16 12:53:34 EDT 2004


Josiah Carlson wrote:
>>I have a number of strings, containing wildcards (e.g. 'abc#e#' where # 
>>is anything), which I want to match with a test string (e.g 'abcdef'). 
>>What would be the best way for me to store my strings so lookup is as 
>>fast as possible?
> 
> 
> Start with a Trie, and virtually merge branches as necessary.
> 
>  - Josiah
> 

Yup, you might also have a look at the Aho-Corasick algorithm, which can 
match a test string against a big number of strings quite efficiently :

http://www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC.html

You'll have to adapt the algorithm so that it can handle your wilcard, 
though. I found it easy to implement the '.' (one character wildcard), 
but the '.*' (zero or more character wildcard) forces you to have 
backtracking.

But if you can use the regexp engine, do not hesitate, it will save you 
a lot of headaches. Unless of course you're a student and your teacher 
asked you this question ;).

Regards,

Nicolas



More information about the Python-list mailing list