Looking for library to estimate likeness of two strings

Jeff Schwab jeff at schwabcenter.com
Wed Feb 6 17:56:55 EST 2008


Tim Chase wrote:
>> Are there any Python libraries implementing measurement of similarity
>> of two strings of Latin characters?
> 
> It sounds like you're interested in calculating the Levenshtein distance:
> 
> http://en.wikipedia.org/wiki/Levenshtein_distance
> 
> which gives you a measure of how different they are.  A measure of "0" 
> is that the inputs are the same.  The more different the two strings 
> are, the greater the resulting output of the function.
> 
> Unfortunately, it's an O(MN) algorithm (where M=len(word1) and 
> N=len(word2)) from my understanding of the code I've seen. However it 
> really is the best approximation I've seen of a "how similar are these 
> two strings" function.  Googling for
> 
>   python levenshtein distance
> 
> brings up oodles of hits.

If the strings happen to be the same length, the Levenshtein distance is 
equivalent to the Hamming distance.  The Wikipedia article gives the 
following Python implementation:

# http://en.wikipedia.org/wiki/Hamming_distance
def hamdist(s1, s2):
     assert len(s1) == len(s2)
     return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))



More information about the Python-list mailing list