Does python have a function like similar_text in PHP

Oleg Broytmann phd at phd.pp.ru
Fri Mar 1 12:02:48 EST 2002


On Fri, Mar 01, 2002 at 08:50:42AM -0800, ron nixon wrote:
> Does anyone know if Python has a function like similar_text in PHP. It
> compares two strings and returns a percentage for the match. Anyone
> know of anything like this in Python and would be willing to send
> along an example?

---
>From mlh at idt.ntnu.no Fri Aug 27 18:06:22 1999
Date: 27 Aug 1999 15:51:03 +0200
From: Magnus L. Hetland <mlh at idt.ntnu.no>
To: python-list at python.org
Newsgroups: comp.lang.python
Subject: Re: Fuzzy string matching
Status: RO
X-Status: 
X-Keywords:
X-UID: 1


Explanation of the distance algorithm...

(Got a request for this, and thought I might as well post it...)

The algorithm:

def distance(a,b):
    c = {}
    n = len(a); m = len(b)

    for i in range(0,n+1):
        c[i,0] = i
    for j in range(0,m+1):
        c[0,j] = j
        
    for i in range(1,n+1):
        for j in range(1,m+1):
            x = c[i-1,j]+1
            y = c[i,j-1]+1
            if a[i-1] == b[j-1]:
                z = c[i-1,j-1]
            else:
                z = c[i-1,j-1]+1
            c[i,j] = min(x,y,z)
    return c[n,m]

It calculates the following: Given two strings, a and b, and three
operations, adding, subtracting and exchanging single characters, what
is the minimal number of steps needed to translate a into b?

The method is based on the following idea:

  We want to find the distance between a[:x] and b[:y]. To do this, we
  first calculate

  1) the distance between a[:x-1] and b[:y], adding the cost of a
     subtract-operation, used to get from a[:x] to a[:z-1];

  2) the distance between a[:x] and b[:y-1], adding the cost of an
     addition-operation, used to get from b[:y-1] to b[:y];

  3) the distance between a[:x-1] and b[:y-1], adding the cost of a
     *possible* exchange of the letter b[y] (with a[x]).

The cost of the subtraction and addition operations are 1, while the
exchange operation has a cost of 1 if a[x] and b[y] are different, and
0 otherwise.

After calculating these costs, we choose the least one of them (since
we want to use the best solution.)

Instead of doing this recursively (i.e. calculating ourselves "back"
from the final value), we build a cost-matrix c containing the optimal
costs, so we can reuse them when calculating the later values. The
costs c[i,0] (from string of length n to empty string) are all i, and
correspondingly all c[0,j] (from empty string to string of length j)
are j.

Finally, the cost of translating between the full strings a and b
(c[n,m]) is returned.

I guess that ought to cover it...

--

  Magnus              Making no sound / Yet smouldering with passion
  Lie          The firefly is still sadder / Than the moaning insect
  Hetland                                       : Minamoto Shigeyuki

-- 
http://www.python.org/mailman/listinfo/python-list

>From mlh at idt.ntnu.no Sun Aug 29 13:39:11 1999
Date: 28 Aug 1999 19:50:04 +0200
From: Magnus L. Hetland <mlh at idt.ntnu.no>
To: python-list at python.org
Newsgroups: comp.lang.python
Subject: Re: Fuzzy string matching?
Status: RO
X-Status: 
X-Keywords:
X-UID: 2


Yet another version, without the append...

def distance(a,b):
    "Calculates the Levenshtein distance between a and b."
    n, m = len(a), len(b)
    if n > m:
        # Make sure n <= m, to use O(min(n,m)) space
        a,b = b,a
        n,m = m,n
        
    current = range(n+1)
    for i in range(1,m+1):
        previous, current = current, [i]+[0]*m
        for j in range(1,n+1):
            add, delete = previous[j]+1, current[j-1]+1
            change = previous[j-1]
            if a[j-1] != b[i-1]:
                change = change + 1
            current[j] = min(add, delete, change)
            
    return current[n]


I *could* have written [i]*(m+1) - but I thought this was clearer... :)

--

  Magnus              Making no sound / Yet smouldering with passion
  Lie          The firefly is still sadder / Than the moaning insect
  Hetland                                       : Minamoto Shigeyuki

-- 
http://www.python.org/mailman/listinfo/python-list
---

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.




More information about the Python-list mailing list