searching algorithm
Neil Cerutti
horpner at yahoo.com
Thu May 17 11:50:52 EDT 2007
On 2007-05-11, Terry Reedy <tjreedy at udel.edu> wrote:
>
> "Neil Cerutti" <horpner at yahoo.com> wrote in message
> news:lNM0i.34412$G23.27437 at newsreading01.news.tds.net...
>| Every node is a tuple of its letter, a list of its children, and
>| a list of its words. So the two 'pelin' nodes would be (with 'e'
>| referenced in the 'h' node):
>|
>| ('h', [('e', [], ['pelin'])], ['pelin'])
>|
>| That would in turn be "stored" in the t, n, i and s nodes.
> [snip]
>
> At the outer level, I would use a list in order to build the
> structure in pieces, one for each letter, and then add them in.
>
> At the top level, the letters do not need explicit storage.
> The first subtree is for words starting with 'a', etc. In
> other words, the position of each subtree indicates its
> starting letter. For most letters, this can be carried on
> another level.
Here's a proof of concept prototype of the dictionary searching
algorithm. The dictionary I found to test with has roughly 5,000
words, and I didn't attempt any optimizations at all. I'm
guessing it can't be sped up very much, even though I wrote the
simplest possible implementation. Lookup in the worst case is
O(N), where N is the number of letters in your prefix.
First, a sample of input/output.
lookup: pe
people: la gente[Noun]
pepper: pepe[Noun]
peppercorn: grano di pepe[Noun]
peppermint: menta peperita[Noun]
peppery: pepato, acre, pungente[Adjective]
pepsin: pepsina[Noun]
permutation: permutazione[Noun]
permutations: permutazioni[Noun]
permute: permutare[Verb]
perorate: perorare
perpendicular: perpendicolare[Adjective]
perpendicularly: perpendicolarmente[Adverb]
perpendiculars: perpendicolari[Noun]
perpetrate: perpetrare[Verb]
perpetrated: perpetrato[Verb]
perpetual: perpetuo[Adjective]
petard: petardo[Noun]
petroleum: petrolio[Noun]
Now the source.
# Each node is a tuple of (letter, node list, word list). A word
# list is just a list of strings.
root = ('', [], [])
def insert(node, key, value):
if len(key) == 0:
node[2].append(value)
else:
first = key[0]
rest = key[1:]
for n in node[1]:
if n[0] == first:
insert(n, rest, value)
return
node[1].append((first, [], []))
insert(node, key, value)
def lookup(node, key):
if len(key) == 0:
return node
else:
first = key[0]
rest = key[1:]
for v in node[1]:
if v[0] == first:
return lookup(v, rest)
return None
def word_list(node, word):
for v in node[2]:
print '%s: %s' % (word, v)
for n in node[1]:
word_list(n, word+n[0])
# Italian.txt from http://www.june29.com/IDP/
source = open('Italian.txt', 'r')
# Build tree
for line in source:
line = line.strip()
if line[0] == '#': continue
key, value = line.split('\t')
insert(root, key, value)
# Look up a prefix
x = raw_input("lookup: ")
tree = lookup(root, x)
if tree:
word_list(tree, x)
else:
print "Not found."
--
Neil Cerutti
More information about the Python-list
mailing list