Substring Detection? Pythonically?

Remco Gerlich scarblac at pino.selwerd.nl
Wed Oct 4 16:28:01 EDT 2000


Stephen Hansen <stephen at cerebralmaelstrom.com> wrote in comp.lang.python:
> Okay, say I have three different strings:
> 	#1: they
> 	#2: that
> 	#3: tommy
> 
> And a user gives me a string -- 'the', I want it to match to 'they'. Then
> say they give me a string, 'to', I want it to match to 'tommy'. A string
> of 'th' or 't' is ambiguious, and i want a list returned, ['they','that']
> and ['they','that','tommy'] respectively.

In the examples I'll give, it'll always a return of list, with one
element if there's only one option, that's a bit simpler.

> Note that my 'words' will end up being a rather big list, mebbe a hundred
> at least, and i'll be checking against it very frequently, so want this to
> be as efficient as possible.

If you want it to be really efficient speed-wise, you build a dictionary
with all prefixes, like

dict = {
   't' : ['they', 'that', 'tommy'],
   'th' : ['they', 'that', 'tommy'],
   'the': ['they'],
   'they': ['they'],
   etc
}

Building the dict is simple:
dict = {}
words = ['they', 'that', 'tommy']
for word in words:
   for index in range(1,len(word)+1):
      dict[word[:index]] = dict.get(word[:index],[])+[word]
      
Remember to use only copies of the lists in the dictionary, otherwise
it could be changed from outside.

Alternatively something like
def get_words(prefix, wordlist):
   return filter(lambda prefix=prefix,x: prefix==x[:len(prefix)], wordlist)
   
Or in 2.0 language
def get_words(prefix, wordlist):
   return [word for word in wordlist if word.startswith(prefix)]
   
That's simpler and doesn't store a dictionary, but the dictionary should
be faster (but if you're waiting for user input, that's not going to
matter anyway).

-- 
Remco Gerlich



More information about the Python-list mailing list