Help Optimizing Word Search

Case Nelson case.nelson at gmail.com
Tue Jan 11 18:39:29 EST 2005


Hi there I've just been playing around with some python code and I've
got a fun little optimization problem I could use some help with.

Basically, the program needs to take in a random list of no more than
10 letters,  and find all possible mutations that match a word in my
dictionary (80k words). However a wildcard letter '?' is also an
acceptable character which increases the worst case time significantly.
So if the letters are ['a','b','c'] check a, b, c, ab, ac, ba, bc, ca,
cb, abc, acb, bac, bca, cab, cba where only a, ba and cab would be
added to the dict of words. If the letters are ['?','?'] check a-z, aa,
ab, ac, ad, ..., az, ba, bb, bc, bd, ..., zz

I'm using a trie structure to load and query my dictionary, which
returns a 1 if the word is found, 4 if a partial word is found, and 3
if there is no possible word.

I guess I'm wondering if anyone could think of a faster way to deal
with the wildcards, perhaps removing the inner for-loop or else
something entirely different (iterative, don't use the trie?).

Any other suggestions would be welcome

findword recursion runs in 9.16300010681 secs
words.log is simply my list of words:
['hats', 'easts', 'baizes',...,'sash']

--- Code Begin ---
PY> import trie
PY> import time
PY>
PY> print 'loading trie ...',
PY> mytrie = trie.Trie('dicts.txt')
PY> print 'done.'
PY>
PY> alpha = list('abcdefghijgklmnopqrstuvwxyz')
PY>
PY> words = {}
PY> def findword(word, letters):
PY>     # run out of letters, recursion stop point
PY>     if letters == []:
PY>         return
PY>
PY>     if word != "":
PY>         # Check if word is valid
PY>         result = mytrie.query(word)
PY>
PY>         # Found a word, add it to the database
PY>         if result == 1:
PY>             words[word] = 1
PY>
PY>         # No other word starts with my current word, recursion stop
point
PY>         if result == 3:
PY>             return
PY>
PY>     # Run through every letter in our list
PY>     for i,letter in enumerate(letters):
PY>         # Remove current letter and recurse
PY>         newletters = letters[:]
PY>         newletters.pop(i)
PY>
PY>         # if current letter is ? must recurse through all 26
letters
PY>         if letter == '?':
PY>             for c in alpha:
PY>                 # recurse word+wildcard letter
PY>                 findword(word+c,newletters)
PY>         else:
PY>                 # recurse word+letter
PY>                 findword(word+letter,newletters)
PY>
PY> letters = list('abae?s?')
PY> s = time.time()
PY> findword("",letters)
PY> print time.time() - s
PY>
PY> output = open('words.log','w')
PY> print >> output, words.keys()
PY> output.close()
PY>
--- Code End ---
Thanks 
(Hopefully google doesn't eat my whitespace)




More information about the Python-list mailing list