Efficient String Lookup?

Chris S. chrisks at NOSPAM.udel.edu
Sat Oct 16 17:56:07 EDT 2004


Michael Hoffman wrote:
> Chris S. wrote:
> 
>> I have a number of strings, containing wildcards (e.g. 'abc#e#' where 
>> # is anything), which I want to match with a test string (e.g 
>> 'abcdef'). What would be the best way for me to store my strings so 
>> lookup is as fast as possible?
> 
> 
> If it were me, I would store them as compiled regular expressions.
> 
> See the re module documentation and use re.compile().
> 
> If you want a better solution, it might help if you supply a little more 
> information about your problem and why this solution is unsuitable 
> (maybe it is :]).

The problem is I want to associate some data with my pattern, as in a 
dictionary. Basically, my application consists of a number of 
conditions, represented as strings with wildcards. Associated to each 
condition is arbitrary data explaining "what I must do". My task is to 
find this data by match a state string with these condition strings. Of 
course, the brute force approach is to just add each pattern to a 
dictionary, and linearly seach every key for a match. To improve this, I 
considered a trie, implemented as a special dictionary:

class Trie(dict):
     '''Implements a traditional Patricia-style Tria.
     Keys must be sequence types. None key represents a value.'''

     def __init__(self):
         dict.__init__(self)

     def __setitem__(self, key, value):
         assert key, 'invalid key '+str(key)
         d = self
         last = None
         for n in key:
             if n not in d:
                 dict.__setitem__(d, n, {})
             last = (d,n)
             d = dict.__getitem__(d, n)
         (d,n) = last
         dict.__getitem__(d, n)[None] = value

     def __getitem__(self, key):
         d = self
         for n in key:
             assert n in d, 'invalid key '+str(key)
             d = dict.__getitem__(d, n)
         assert None in d, 'missing value for key '+str(key)
         return dict.__getitem__(d, None)

     def __delitem__(self, key):
         previous = []
         d = self
         for n in key:
             assert n in d, 'invalid key '+str(key)
             previous.append((d,n))
             d = dict.__getitem__(d, n)
         assert None in d, 'missing value for key '+str(key)
         # remove value
         dict.__delitem__(d, None)
         # find and remove empty keys
         while len(previous):
             (d,n) = previous.pop()
             if not len(dict.__getitem__(d, n)):
                 dict.__delitem__(d, n)
             else:
                 break

However, I'm uncertain about the efficiency of this approach. I'd like 
to use regexps, but how would I associate data with each pattern?



More information about the Python-list mailing list