Efficient String Lookup?
Chris S.
chrisks at NOSPAM.udel.edu
Sat Oct 16 17:56:07 EDT 2004
Michael Hoffman wrote:
> Chris S. wrote:
>
>> I have a number of strings, containing wildcards (e.g. 'abc#e#' where
>> # is anything), which I want to match with a test string (e.g
>> 'abcdef'). What would be the best way for me to store my strings so
>> lookup is as fast as possible?
>
>
> If it were me, I would store them as compiled regular expressions.
>
> See the re module documentation and use re.compile().
>
> If you want a better solution, it might help if you supply a little more
> information about your problem and why this solution is unsuitable
> (maybe it is :]).
The problem is I want to associate some data with my pattern, as in a
dictionary. Basically, my application consists of a number of
conditions, represented as strings with wildcards. Associated to each
condition is arbitrary data explaining "what I must do". My task is to
find this data by match a state string with these condition strings. Of
course, the brute force approach is to just add each pattern to a
dictionary, and linearly seach every key for a match. To improve this, I
considered a trie, implemented as a special dictionary:
class Trie(dict):
'''Implements a traditional Patricia-style Tria.
Keys must be sequence types. None key represents a value.'''
def __init__(self):
dict.__init__(self)
def __setitem__(self, key, value):
assert key, 'invalid key '+str(key)
d = self
last = None
for n in key:
if n not in d:
dict.__setitem__(d, n, {})
last = (d,n)
d = dict.__getitem__(d, n)
(d,n) = last
dict.__getitem__(d, n)[None] = value
def __getitem__(self, key):
d = self
for n in key:
assert n in d, 'invalid key '+str(key)
d = dict.__getitem__(d, n)
assert None in d, 'missing value for key '+str(key)
return dict.__getitem__(d, None)
def __delitem__(self, key):
previous = []
d = self
for n in key:
assert n in d, 'invalid key '+str(key)
previous.append((d,n))
d = dict.__getitem__(d, n)
assert None in d, 'missing value for key '+str(key)
# remove value
dict.__delitem__(d, None)
# find and remove empty keys
while len(previous):
(d,n) = previous.pop()
if not len(dict.__getitem__(d, n)):
dict.__delitem__(d, n)
else:
break
However, I'm uncertain about the efficiency of this approach. I'd like
to use regexps, but how would I associate data with each pattern?
More information about the Python-list
mailing list