Looking for library to estimate likeness of two strings

Thu Feb 7 08:38:24 EST 2008

Matthew_WARREN at bnpparibas.com wrote:
> 
> 
> 
> 
> 
> 
>> On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote:
>>
>>> Jeff Schwab wrote:
>> ...
>>>> If the strings happen to be the same length, the Levenshtein distance
>>>> is equivalent to the Hamming distance.
> 
> Is this really what the OP was asking for. If I understand it correctly,
> Levenshtein distance works out the number of edits required to transform
> the string to the target string. The smaller the more equivalent, but with
> the OP's problem I would expect
> 
> 
> table1      table2
> brian       briam
>             erian
> 
> 
> I think the OP would like to guess at 'briam' rather than 'erian', but
> Levenstein would rate them equally good guesses?
> 
> I know this is pushing it more toward phonetic alaysis of the words or
> something similar, and thats orders of magnitude more complex.
> 
> just in case,
> 
> http://www.linguistlist.org/sp/Software.html#97
> 
> might be a good place to start looking into it, along with the NLTK
> libraries here
> 
> http://nltk.sourceforge.net/index.php/Documentation
> 
You could perhaps use soundex to try to choose between different 
possibilities with the same Levenshtein distance from the sample. 
Soundex by itself is horrible, but it might work as a prioritizer.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/