Fuzzy Lookups
Diez B. Roggisch
deets at nospam.web.de
Mon Jan 30 10:54:30 EST 2006
> As mentioned above this works quite well and I am happy with it, but I
> wonder if there is a more Pythonic way of doing this type of lookup?
I did a levenshtein-fuzzy-search myself, however I enhanced my version by
normalizing the distance the following way:
def relative(a, b):
"""
Computes a relative distance between two strings. Its in the range
(0-1] where 1 means total equality.
@type a: string
@param a: arg one
@type b: string
@param b: arg two
@rtype: float
@return: the distance
"""
d = distance(a,b)
longer = float(max((len(a), len(b))))
shorter = float(min((len(a), len(b))))
r = ((longer - d) / longer) * (shorter / longer)
return r
distance is a run-of-the mill levenshtein-distance as yours.
The advantage becomes apparent when you try to e.g. compare
"Angelina Jolie"
with
"AngelinaJolei"
and
"Bob"
Both have a l-dist of 3, but the normalized distance is
0.729591836735
and
0.015306122449
respectively - making you chose the better match :)
regards,
Diez
More information about the Python-list
mailing list