Efficient String Lookup?

Bengt Richter bokr at oz.net
Sun Oct 17 02:50:58 EDT 2004


On Sun, 17 Oct 2004 05:50:49 GMT, "Chris S." <chrisks at NOSPAM.udel.edu> wrote:

>Bengt Richter wrote:
>
>> On Sat, 16 Oct 2004 09:11:37 GMT, "Chris S." <chrisks at NOSPAM.udel.edu> wrote:
>> 
>> 
>>>I have a number of strings, containing wildcards (e.g. 'abc#e#' where # 
>>>is anything), which I want to match with a test string (e.g 'abcdef'). 
>>>What would be the best way for me to store my strings so lookup is as 
>>>fast as possible?
>> 
>> 
>> Insufficient info. But 'fast as possible' suggests putting your strings in
>> a flex grammar and generating a parser in c. See
>>     http://www.gnu.org/software/flex/
>> Defining a grammar is a good exercise in precise definition of the problem anyway ;-)
>> 
>> If you want to do it in python, you still need to be more precise...
>> 
>> - is # a single character? any number of characters?
>> - if your test string were 'abcdefabcdef' would you want 'abc#e#' to match the whole thing?
>>   or match abcdef twice?
>> - if one wild card string matches, does that "use up" the test string so other wild card strings
>>   mustn't match? If so, what has priority? Longest? shortest? Other criterion? 
>> - etc etc
>
>Sorry for the ambiguity. My case is actually pretty simple. '#' 
>represents any single character, so it's essentially the same as re's 
>'.'. The match must be exact, so the string and pattern must be of equal 
>lengths. Each wildcard is independent of other wildcards. For example, 
>suppose we restricted the possible characters to 1 and 0, then the 
>pattern '##' would only match '00', '01', '10', and '11'. This pattern 
>would not match '0', '111', etc. I feel that a trie would work well, but 
>I'm concerned that for large patterns, the overhead in the Python 
>implementation would be too inefficient.

So is the set of patterns static and you want to find which pattern(s!)
match dynamic input? How many patterns vs inputs strings? What max
length patterns, input strings? Total volume?

Regards,
Bengt Richter



More information about the Python-list mailing list