searching algorithm

Michael Bentley michael at jedimindworks.com
Fri May 11 04:50:14 EDT 2007


On May 10, 2007, at 12:26 PM, Gigs_ wrote:

> Hi all!
>
> I have text file (english-croatian dictionary) with words in it in  
> alphabetical
> order.
> This file contains 179999 words in this format:
> english word: croatian word
>
> I want to make instant search for my gui
> Instant search, i mean that my program search words and show words  
> to user as
> user type letters.
> yes, it needs to be fast
>
> Can someone give me some points (approaches) how to make this
> Should i make indexes and go with file.seek
>
> or should breake dictionary in peaces and put words that start with  
> a in one and
> with b in another...
> ?????
>
> So if i have this words
> ...
>
> if user type: "abs" program should list all words above in english  
> and in croatian
> if user type: "absorb" than program should list last 3 words in  
> english and in
> croatian
>
>
>
>
>
> any help would be appreciate!
> my apologies for bad english

Here's an idea:  use a rats' nest of dictionaries and do all the  
lookup work up front when you build the rats' nest.  Maybe something  
like this:

#! /usr/bin/env python
import pprint

dictionary = """absinth:pelin
absinthe:pelin
absolute:apsolutan
absolute:apsolutni kod
absolute:apsolutno
absolute:čist
absolute:nesumnjiv
absolute:potpun
absolute:savrsen
absolute coordinates:apsolutne koordinate
absolute frequency:apsolutna učestalost
absolute gap:apsolutni jaz
absolute line spacing:apsolutni međurazmak linija
absolute majority:apsolutna većina
absolute pointing device:apsolutni pokazivački uređaj
absolute quantity:apsolutni udio
absolute value:apsolutna vrijednost
absolute zero:apsolutna nula
absolutely:apsolutno
absolutely:bezuvjetno
absolutely:nezavisno
absolutely:potpuno
absolutely:samostalno
absolutely:sasvim
absolution:odrjesenje
absolution:oprostaj
absolutism:apsolutizam
absolve:odrijesiti
absolve:osloboditi
absorb:absorbirati
absorb:apsorbirati
absorb:crpsti"""

lookup = {'words':{}, 'letters':{}}

for translation in dictionary.split('\n'):
     english, croatian = translation.split(':')
     if english in lookup['words']:
         lookup['words'][english].append(croatian)
     else:
         lookup['words'][english] = [croatian]

     for position, letter in enumerate(english):
         if position == 0:
             youAreHere = lookup['letters']

         if letter not in youAreHere:
             youAreHere[letter] = {'words':[]}
         youAreHere[letter]['words'].append(lookup['words'][english])
         youAreHere = youAreHere[letter]

def tryit(partial):
     youAreHere = lookup['letters']
     for letter in partial:
         youAreHere = youAreHere[letter]
     return youAreHere['words']

if __name__ == '__main__':
     pprint.pprint(tryit('abso'))


Hope this helps,
Michael
---
The Rules of Optimization are simple.
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- Michael A. Jackson , "Principles of
Program Design", 1975.





More information about the Python-list mailing list