Find closest matching string based on collection of strings in list/dict/set

Joel Goldstick joel.goldstick at columbuswebmakers.com
Tue Aug 31 17:01:26 EDT 2010


python at bdurham.com wrote:
> I'm parsing a simple, domain specific scripting language that has
> commands like the following: *page, *title, *text, *footer, etc.
> There are about 100 of these '*' commands. When we see a command
> that we don't recognize, I would like to find the closest match
> possible (from a list of all legal commands) and include this
> suggestion in our diagnostic output.
> 
> I'm not sure what you would call the type of algorithm I'm
> looking for: closest matching string or auto-correct?
> 
> Any suggestions on algorithms or python libraries that would help
> me do what I'm looking for?
> 
> Here's my short-list of ideas based on my research so far:
> - Soundex
> - Lawrence Philips' Metaphone Algorithm (from aspell?)
> - Edit Distance Algorithm
> - Levenstein Word Distance
> 
> Any suggestions, comments on the above techniques, or ideas on a
> simpler algorithm for finding close string matches based on a
> list, dict, or set of possible strings?
> 
> Thank you,
> Malcolm
> 
> 
Have you looked at difflib?



More information about the Python-list mailing list