Comparing 2 similar strings?

John Machin sjmachin at lexicon.net
Thu May 19 00:09:32 EDT 2005


On Wed, 18 May 2005 22:37:21 -0500, Ed Morton <morton at lsupcaemnt.com>
wrote:

>
>
>John Machin wrote:
>
>> On Wed, 18 May 2005 20:03:53 -0500, Ed Morton <morton at lsupcaemnt.com>
>> wrote:
><snip>
>>>I assume you were actually being facetious
>>>and trying to make the point 
>>>that names that don't look the same on paper can have the same soundex 
>>>encoding and that's obviously countered with the fact that soundex is 
>>>just a cheap and cheerful way to find names that probably sound similair 
>>>which can vary tremendously based on ethnicity or accent.
>> 
>> 
>> *If* you want phonetic similarity, there are methods that much better
>> than soundex, in the sense of fewer false positives and fewer false
>> negatives. Google for NYSIIS, dolby, metaphone, caverphone.
>
>And I assume I'd find they all have pros and cons too, otherwise you'd 
>be referring to THE best one rather than a selection.

*ALL* approximate matching methods have pros and cons -- and all
others have fewer than soundex.

> It seems a bit 
>pointless to go browsing through the documentation on them when someone 
>who presumably already has can't just state the best one for the job.

They were listed in roughly increasing order of general rough
effectiveness. It depends on the job. It depends on the language. None
of them would work well with your O Muirchaitaeioughs :-)

>
>> Cheap? You get what you pay for.
>> 
>> Cheerful? What's the relevance?
>
>"Cheap and cheerful" is a colloquial expression meaning cost-effective.

Grossly misapplied to soundex.

>
>> Someone who types "Mousaferiadis" into a customer search screen and
>> gets back several lines of McPherson and MacPherson is unlikely to be
>> cheerful -- even before we factor in the speed [soundex divides the
>> universe into a relative small number of buckets].
>> 
>> Someone who's looking for Erin when they should be looking for Aaron
>> (or vice versa) won't get much cheer out of soundex, either.
>
>That goes back to accent. In [some parts at least of] the USA Erin 
>sounds very much like Aaron wheras in the UK the 2 are very dissimilar. 
>I assume since you apparently consider them similair that you live in 
>the USA

You assume incorrectly. In any case my whereabouts are of sublime
irrelevance. What matters is that some people will as you say think
that Aaron and Erin sound similar in the best of circumstances; they
are prone to be mistaken one for the other by (say) a tired call
centre operative especially when the caller and the callee are from
different backgrounds.

> and so would consider soundex as providing a "false negative" by 
>saying they don't match. Perhaps one of the other approaches you suggest 
>would report that they do match but that wouldn't make it clearly a 
>better choice to everyone.

None of the other approaches make the mistake of preserving the first
letter -- this alone is almost enough reason for jettisoning soundex.





More information about the Python-list mailing list