Efficient String Lookup?

Sun Oct 17 01:50:49 EDT 2004

Bengt Richter wrote:

> On Sat, 16 Oct 2004 09:11:37 GMT, "Chris S." <chrisks at NOSPAM.udel.edu> wrote:
> 
> 
>>I have a number of strings, containing wildcards (e.g. 'abc#e#' where # 
>>is anything), which I want to match with a test string (e.g 'abcdef'). 
>>What would be the best way for me to store my strings so lookup is as 
>>fast as possible?
> 
> 
> Insufficient info. But 'fast as possible' suggests putting your strings in
> a flex grammar and generating a parser in c. See
>     http://www.gnu.org/software/flex/
> Defining a grammar is a good exercise in precise definition of the problem anyway ;-)
> 
> If you want to do it in python, you still need to be more precise...
> 
> - is # a single character? any number of characters?
> - if your test string were 'abcdefabcdef' would you want 'abc#e#' to match the whole thing?
>   or match abcdef twice?
> - if one wild card string matches, does that "use up" the test string so other wild card strings
>   mustn't match? If so, what has priority? Longest? shortest? Other criterion? 
> - etc etc

Sorry for the ambiguity. My case is actually pretty simple. '#' 
represents any single character, so it's essentially the same as re's 
'.'. The match must be exact, so the string and pattern must be of equal 
lengths. Each wildcard is independent of other wildcards. For example, 
suppose we restricted the possible characters to 1 and 0, then the 
pattern '##' would only match '00', '01', '10', and '11'. This pattern 
would not match '0', '111', etc. I feel that a trie would work well, but 
I'm concerned that for large patterns, the overhead in the Python 
implementation would be too inefficient.