Find closest matching string based on collection of strings in list/dict/set

Shashwat Anand anand.shashwat at gmail.com
Tue Aug 31 15:57:22 EDT 2010


On Tue, Aug 31, 2010 at 9:12 PM, <python at bdurham.com> wrote:

> I'm parsing a simple, domain specific scripting language that has commands
> like the following: *page, *title, *text, *footer, etc. There are about 100
> of these '*' commands. When we see a command that we don't recognize, I
> would like to find the closest match possible (from a list of all legal
> commands) and include this suggestion in our diagnostic output.
>
> I'm not sure what you would call the type of algorithm I'm looking for:
> closest matching string or auto-correct?
>
> Any suggestions on algorithms or python libraries that would help me do
> what I'm looking for?
>
> Here's my short-list of ideas based on my research so far:
> - Soundex
> - Lawrence Philips' Metaphone Algorithm (from aspell?)
> - Edit Distance Algorithm
> - Levenstein Word Distance
>
> Any suggestions, comments on the above techniques, or ideas on a simpler
> algorithm for finding close string matches based on a list, dict, or set of
> possible strings?
>

You are looking possibly for autocompletion/spell correction algorithm. You
can write a HMM based Pos Tagger.


> Thank you,
> Malcolm
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


-- 
~l0nwlf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100901/63bb2705/attachment-0001.html>


More information about the Python-list mailing list