Looking for library to estimate likeness of two strings

Matthew_WARREN at bnpparibas.com Matthew_WARREN at bnpparibas.com
Thu Feb 7 06:37:19 EST 2008








> On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote:
>
> > Jeff Schwab wrote:
> ...
> >> If the strings happen to be the same length, the Levenshtein distance
> >> is equivalent to the Hamming distance.

Is this really what the OP was asking for. If I understand it correctly,
Levenshtein distance works out the number of edits required to transform
the string to the target string. The smaller the more equivalent, but with
the OP's problem I would expect


table1      table2
brian       briam
            erian


I think the OP would like to guess at 'briam' rather than 'erian', but
Levenstein would rate them equally good guesses?

I know this is pushing it more toward phonetic alaysis of the words or
something similar, and thats orders of magnitude more complex.

just in case,

http://www.linguistlist.org/sp/Software.html#97

might be a good place to start looking into it, along with the NLTK
libraries here

http://nltk.sourceforge.net/index.php/Documentation



Matt.


--


This message and any attachments (the "message") is
intended solely for the addressees and is confidential. 
If you receive this message in error, please delete it and 
immediately notify the sender. Any use not in accord with 
its purpose, any dissemination or disclosure, either whole 
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message. 
BNP PARIBAS (and its subsidiaries) shall (will) not 
therefore be liable for the message if modified. 
Do not print this message unless it is necessary,
consider the environment.

                ---------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le 
"message") sont etablis a l'intention exclusive de ses 
destinataires et sont confidentiels. Si vous recevez ce 
message par erreur, merci de le detruire et d'en avertir 
immediatement l'expediteur. Toute utilisation de ce 
message non conforme a sa destination, toute diffusion 
ou toute publication, totale ou partielle, est interdite, sauf 
autorisation expresse. L'internet ne permettant pas 
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce 
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.



More information about the Python-list mailing list