Looking for library to estimate likeness of two strings

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Wed Feb 6 19:36:00 EST 2008


On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote:

> Jeff Schwab wrote:
...
>> If the strings happen to be the same length, the Levenshtein distance
>> is equivalent to the Hamming distance.
...
> I'm afraid that it isn't. Using Magnus Lie Hetland's implementation:
...
> In [4]: hamdist('abcdef', 'fabcde')
> Out[4]: 6
> 
> In [5]: levenshtein('abcdef', 'fabcde') 
> Out[5]: 2


I can confirm Robert's results, using a different implementation of the 
Levenshtein edit distance. It isn't true that Levenshtein simplifies to 
Hamming when the strings are equal length.


For what it's worth, here's my implementation, which I wrote some years 
ago. I doubt it's optimal.





def levenshtein(s1, s2):
    """Return the Levenshtein edit distance between two strings.

    s1 and s2 should be two arbitrary length strings. Returns an 
    integer count of the edit distance between the strings.
    """
    m, n = len(s1), len(s2)
    if m == 0: return n
    elif n == 0: return m
    # Create an array with rows 0..m and columns 0..n.
    array = [None]*(m+1)
    for i in range(m+1):
        array[i] = [None]*(n+1)
    # Initialise the first row to 0..n.
    for i in range(n+1):
        array[0][i] = i
    # Initialize the first column to 0..m.
    for i in range(m+1):
        array[i][0] = i
    # Measure the differences.
    for i in range(1, m+1):
        c1 = s1[i-1]
        for j in range(1, n+1):
            c2 = s2[j-1]
            cost = int(c1 != c2)
            x = array[i][j-1] + 1  # Cell immediately above plus one.
            y = array[i-1][j] + 1  # Cell immediately left plus one.
            z = array[i-1][j-1] + cost  # Cell diagonally above and 
            # to the left, plus the cost.
            array[i][j] = min(x, y, z)
    # When done, the bottom-right cell contains the Levenshtein distance.
    return array[m][n]




-- 
Steven



More information about the Python-list mailing list