Levenshtein word comparison -performance issue

S.Selvam Siva s.selvamsiva at gmail.com
Fri Feb 13 05:16:00 EST 2009


Hi all,

I need some help.
I tried to find top n(eg. 5) similar words for a given word, from a
dictionary of 50,000 words.
I used python-levenshtein module,and sample code is as follow.

def foo(searchword):
    disdict={}
    for word in self.dictionary-words:
                   distance=Levenshtein.ratio(searchword,word)
                   disdict[word]=distance
    """
     sort the disdict dictionary by values in descending order
    """
    similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True)

    return similarwords[:5]

foo() takes a search word and compares it with dictionary of 50,000 and
assigns each word a value(lies between 0 to 1).
Then after sorting in descending order it returns top 5 similar words.

The problem is, it* takes long time* for processing(as i need to pass more
search words within a loop),i guess the code could be improved to work
efficiently.Your suggestions are welcome...
-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090213/b7e6854c/attachment.html>


More information about the Python-list mailing list