regex-strategy for finding *similar* words?

Christoph Pingel ch.pingel at web.de
Fri Nov 19 05:17:11 EST 2004


>There
>are faster algorithms available than the dynamic-programming one --
>google "Heikki Hyyro". These allow you to preprocess your input word
>and then compare with multiple dictionary candidates in O(kn) instead
>of O(kmn) time where the input word has length m, and you compare with
>k dictionary words each of average length n.

John,

thanks for the rich input!
I found the Hyyro/Navarro algorithm, and this looks interesting. 
First I will give python's own difflib.SequenceMatcher class a try - 
the project here doesn't necessarily need the fastest algorithm, 
coding time is an issue however. I found that a match ratio slightly 
above 0.85 captures most of my misspelling cases without producing 
errors.

best,
Christoph






More information about the Python-list mailing list