String comparision

S.Selvam Siva s.selvamsiva at gmail.com
Mon Jan 26 01:30:05 EST 2009


Thank You Gabriel,

On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar>wrote:

> En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva <s.selvamsiva at gmail.com>
> escribió:
>
>
>  I am developing spell checker for my local language(tamil) using python.
>> I need to generate alternative word list for a miss-spelled word from the
>> dictionary of words.The alternatives must be as much as closer to the
>> miss-spelled word.As we know, ordinary string comparison wont work here .
>> Any suggestion for this problem is welcome.
>>
>
> I think it would better to add Tamil support to some existing library like
> GNU aspell: http://aspell.net/



That was my plan earlier,But i am not sure how aspell integrates with other
editors.Better i will ask it in aspell mailing list.


> You are looking for "fuzzy matching":
> http://en.wikipedia.org/wiki/Fuzzy_string_searching
> In particular, the Levenshtein distance is widely used; I think there is a
> Python extension providing those calculations.
>
> --
> Gabriel Genellina

The following code served my purpose,(thanks for some unknown contributors)
def distance(a,b):
c = {}
    n = len(a); m = len(b)

    for i in range(0,n+1):
        c[i,0] = i
    for j in range(0,m+1):
        c[0,j] = j

    for i in range(1,n+1):
        for j in range(1,m+1):
            x = c[i-1,j]+1
            y = c[i,j-1]+1
            if a[i-1] == b[j-1]:
                z = c[i-1,j-1]
            else:
                z = c[i-1,j-1]+1
            c[i,j] = min(x,y,z)
    return c[n,m]

a=sys.argv[1]
b=sys.argv[2]
d=distance(a,b)
print "d=",d
longer = float(max((len(a), len(b))))
shorter = float(min((len(a), len(b))))
r = ((longer - d) / longer) * (shorter / longer)
# r ranges between 0 and 1




-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090126/95594bc9/attachment-0001.html>


More information about the Python-list mailing list