Looking for library to estimate likeness of two strings

Tim Chase python.list at tim.thechases.com
Wed Feb 6 17:28:37 EST 2008


> Are there any Python libraries implementing measurement of similarity
> of two strings of Latin characters?

It sounds like you're interested in calculating the Levenshtein 
distance:

http://en.wikipedia.org/wiki/Levenshtein_distance

which gives you a measure of how different they are.  A measure 
of "0" is that the inputs are the same.  The more different the 
two strings are, the greater the resulting output of the function.

Unfortunately, it's an O(MN) algorithm (where M=len(word1) and 
N=len(word2)) from my understanding of the code I've seen. 
However it really is the best approximation I've seen of a "how 
similar are these two strings" function.  Googling for

   python levenshtein distance

brings up oodles of hits.

-tkc






More information about the Python-list mailing list