Comparing 2 similar strings?

Ed Morton morton at lsupcaemnt.com
Wed May 18 23:37:21 EDT 2005



John Machin wrote:

> On Wed, 18 May 2005 20:03:53 -0500, Ed Morton <morton at lsupcaemnt.com>
> wrote:
<snip>
>>I assume you were actually being facetious
>>and trying to make the point 
>>that names that don't look the same on paper can have the same soundex 
>>encoding and that's obviously countered with the fact that soundex is 
>>just a cheap and cheerful way to find names that probably sound similair 
>>which can vary tremendously based on ethnicity or accent.
> 
> 
> *If* you want phonetic similarity, there are methods that much better
> than soundex, in the sense of fewer false positives and fewer false
> negatives. Google for NYSIIS, dolby, metaphone, caverphone.

And I assume I'd find they all have pros and cons too, otherwise you'd 
be referring to THE best one rather than a selection. It seems a bit 
pointless to go browsing through the documentation on them when someone 
who presumably already has can't just state the best one for the job.

> Cheap? You get what you pay for.
> 
> Cheerful? What's the relevance?

"Cheap and cheerful" is a colloquial expression meaning cost-effective.

> Someone who types "Mousaferiadis" into a customer search screen and
> gets back several lines of McPherson and MacPherson is unlikely to be
> cheerful -- even before we factor in the speed [soundex divides the
> universe into a relative small number of buckets].
> 
> Someone who's looking for Erin when they should be looking for Aaron
> (or vice versa) won't get much cheer out of soundex, either.

That goes back to accent. In [some parts at least of] the USA Erin 
sounds very much like Aaron wheras in the UK the 2 are very dissimilar. 
I assume since you apparently consider them similair that you live in 
the USA and so would consider soundex as providing a "false negative" by 
saying they don't match. Perhaps one of the other approaches you suggest 
would report that they do match but that wouldn't make it clearly a 
better choice to everyone.

> 
>>It's a reasonable approach to consider given the very loose requirements 
>>presented.
> 
> 
> Soundex is *NEVER* a reasonable approach to consider. Phonetic
> variation is only one consideration. In any case, the OP didn't appear
> to be concerned with phonetic variations.

The OP didn't say what the application was at all, but you're right that 
from his example he does SEEM more interested in character matches than 
phonetic ones so he'd presumably quickly discard phonetic comparisons if 
that's really not what he wants.

	Ed.



More information about the Python-list mailing list