Levenshtein word comparison -performance issue
S.Selvam Siva
s.selvamsiva at gmail.com
Fri Feb 13 05:16:00 EST 2009
Hi all,
I need some help.
I tried to find top n(eg. 5) similar words for a given word, from a
dictionary of 50,000 words.
I used python-levenshtein module,and sample code is as follow.
def foo(searchword):
disdict={}
for word in self.dictionary-words:
distance=Levenshtein.ratio(searchword,word)
disdict[word]=distance
"""
sort the disdict dictionary by values in descending order
"""
similarwords=sorted(disdict, key=disdict.__getitem__, reverse=True)
return similarwords[:5]
foo() takes a search word and compares it with dictionary of 50,000 and
assigns each word a value(lies between 0 to 1).
Then after sorting in descending order it returns top 5 similar words.
The problem is, it* takes long time* for processing(as i need to pass more
search words within a loop),i guess the code could be improved to work
efficiently.Your suggestions are welcome...
--
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090213/b7e6854c/attachment.html>
More information about the Python-list
mailing list