Comparing 2 similar strings?

John Machin sjmachin at lexicon.net
Wed May 18 16:38:59 EDT 2005


On Wed, 18 May 2005 15:06:53 -0500, Ed Morton <morton at lsupcaemnt.com>
wrote:

>
>
>William Park wrote:
>
>> How do you compare 2 strings, and determine how much they are "close" to
>> each other?  Eg.
>>     aqwerty
>>     qwertyb
>> are similar to each other, except for first/last char.  But, how do I
>> quantify that?
>> 
>> I guess you can say for the above 2 strings that
>>     - at max, 6 chars out of 7 are same sequence --> 85% max
>> 
>> But, for
>>     qawerty
>>     qwerbty
>> max correlation is
>>     - 3 chars out of 7 are the same sequence --> 42% max
>> 
>> (Crossposted to 3 of my favourite newsgroup.)
>>
>
>"However you like" is probably the right answer, but one way might be to 
>compare their soundex encoding 
>(http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?soundex) and figure out 
>percentage difference based on comparing the numeric part.
>

Fantastic suggestion. Here's a tiny piece of real-life test data:

compare the surnames "Mousaferiadis" and "McPherson".









More information about the Python-list mailing list