Fuzzy Lookups
Diez B. Roggisch
deets at nospam.web.de
Mon Jan 30 11:30:06 EST 2006
Fredrik Lundh wrote:
> Diez B. Roggisch wrote:
>
>> The advantage becomes apparent when you try to e.g. compare
>>
>> "Angelina Jolie"
>>
>> with
>>
>> "AngelinaJolei"
>>
>> and
>>
>> "Bob"
>>
>> Both have a l-dist of 3
>
>>>> distance("Angelina Jolie", "AngelinaJolei")
> 3
>>>> distance("Angelina Jolie", "Bob")
> 13
>
> what did I miss ?
Hmm. I missed something - the "1" before the "3" in 13 when I looked on my
terminal after running the example. And according to
http://www.reference.com/browse/wiki/Levenshtein_distance
it has the property
"""It is always at least the difference of the sizes of the two strings."""
And my implementation I got from there (or better from Magnus Lie Hetland
whoms python version is referenced there)
So you are right, my example is crap.
But I ran into cases where my normalizing made sense - otherwise I wouldn't
have done it :)
I guess it is more along the lines of (coughed up example)
"abcdef"
compared to
"abcefd"
"abcd"
I can only say that I used it to fuzzy-compare people's and hotel names, and
applying the normalization made my results by far better.
Sorry to cause confusion.
Diez
More information about the Python-list
mailing list