Fuzzy Lookups

Diez B. Roggisch deets at nospam.web.de
Mon Jan 30 10:54:30 EST 2006


> As mentioned above this works quite well and I am happy with it, but I
> wonder if there is a more Pythonic way of doing this type of lookup?

I did a levenshtein-fuzzy-search myself, however I enhanced my version by
normalizing the distance the following way:

def relative(a, b):
    """
    Computes a relative distance between two strings. Its in the range
    (0-1] where 1 means total equality.
    @type a: string
    @param a: arg one
    @type b: string
    @param b: arg two
    @rtype: float
    @return: the distance
    """
    d = distance(a,b)
    longer = float(max((len(a), len(b))))
    shorter = float(min((len(a), len(b))))    
    r = ((longer - d) / longer) * (shorter / longer)
    return r


distance is a run-of-the mill levenshtein-distance as yours.

The advantage becomes apparent when you try to e.g. compare

"Angelina Jolie"

with

"AngelinaJolei"

and 

"Bob"

Both have a l-dist of 3, but the normalized distance is


0.729591836735

and 

0.015306122449

respectively - making you chose the better match :)

regards,

Diez




More information about the Python-list mailing list