Comparing 2 similar strings?
John Machin
sjmachin at lexicon.net
Wed May 18 16:38:59 EDT 2005
On Wed, 18 May 2005 15:06:53 -0500, Ed Morton <morton at lsupcaemnt.com>
wrote:
>
>
>William Park wrote:
>
>> How do you compare 2 strings, and determine how much they are "close" to
>> each other? Eg.
>> aqwerty
>> qwertyb
>> are similar to each other, except for first/last char. But, how do I
>> quantify that?
>>
>> I guess you can say for the above 2 strings that
>> - at max, 6 chars out of 7 are same sequence --> 85% max
>>
>> But, for
>> qawerty
>> qwerbty
>> max correlation is
>> - 3 chars out of 7 are the same sequence --> 42% max
>>
>> (Crossposted to 3 of my favourite newsgroup.)
>>
>
>"However you like" is probably the right answer, but one way might be to
>compare their soundex encoding
>(http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?soundex) and figure out
>percentage difference based on comparing the numeric part.
>
Fantastic suggestion. Here's a tiny piece of real-life test data:
compare the surnames "Mousaferiadis" and "McPherson".
More information about the Python-list
mailing list