searching algorithm

Thu May 10 18:12:33 EDT 2007

On 2007-05-10, Gordon Airporte <JHoover at fbi.gov> wrote:
>
>> For the above (abrideged) dictionary, you would generate (use a
>> fixed-width "programmers" font so the tree looks good):
>> 
>>                  a
>>                  |
>>                  b
>>                  |
>>                  s
>>                 / \
>>                i   o
>>               /   / \
>>              n   l   r
>>             /   / \   \
>>            t   u   v   b->(absorbirati, crpisti)
>>           /    |   |
>> (pelin)<-h     t   e->(odrije?iti, osloboditi)
>>          |     |
>> (pelin)<-e     e->(apsolutan, apsolutni kod)
>> 
>> As the user enter letters, you just march down the tree, printing
>> all the words held in leaf nodes held in the current node.
>
> Call me dense, but how does one do this in Python - which
> doesn't have pointers? Dictionaries with dictionaries within
> dictionaries... (with each letter as the key and the its
> children as values) is going to be extremely space inefficient,
> right?

Unfortunately, I don't know the space tradeoffs in Python
offhand. Lists and tuples make excellent trees.

The above might be stored as follows:

Every node is a tuple of its letter, a list of its children, and
a list of its words. So the two 'pelin' nodes would be (with 'e'
referenced in the 'h' node):

('h', [('e', [], ['pelin'])], ['pelin'])

That would in turn be "stored" in the t, n, i and s nodes.

('s', 
 [('i', 
  [('n', 
   [('t', 
    [('h', [('e', [], ['pelin'])], ['pelin'])
     [])]
    [])]
   []), ('o' trie (thanks Terry) omitted for my sanity)])

It's a lot harder to write by hand than it would be to use.

My intuition says it shouldn't be terribly hard on resources for
for a 180K dictionary, but I could be wrong. I'm too lazy to
measure. ;)

If it does turn out to be unreasonably huge, then you'd fall back
on solving the problem completely with a binary search returning
a range (I'm not sure of the name), which would be more expensive
at run time, but might be fast enough, and would use a minimal
amount of 'resources'.

-- 
Neil Cerutti