Looking for library to estimate likeness of two strings

JKPeck JKPeck at gmail.com
Thu Feb 7 09:56:46 EST 2008


On Feb 7, 6:11 am, Lee Capps <lca... at cteresource.org> wrote:
> At 14:01 Wed 06 Feb 2008, agen... at gmail.com wrote:
>
> >Are there any Python libraries implementing measurement of similarity
> >of two strings of Latin characters?
>
> >I'm writing a script to guess-merge two tables based on people's
> >names, which are not necessarily spelled the same way in both tables
> >(especially the given names).  I would like some function that would
> >help me make the best guess.
>
> >Many thanks in advance!
>
> I used difflib.get_close_matches for something similar:
>
> http://docs.python.org/lib/module-difflib.html
>
> HTH.
>
> --
> Lee Capps
> Technology Specialist
> CTE Resource Center

Algorithms typically used for name comparisons include soundex,
nysiis, and levenshtein distance.  The last is more general and
variations are used in spell checkers.  You can probably Google for
Python versions.  You can find implementations of these comparison
functions at
www.spss.com/devcentral in the extendedTransforms.py module.
(Requires a login but free).

HTH,
Jon Peck



More information about the Python-list mailing list