Comparing 2 similar strings?
Rob W. W. Hooft
news at rob.hooft.net
Sat May 28 16:51:59 EDT 2005
After reading this thread, I have wrapped up a different approach,
probably not what you were looking for, but it is very good for what I
wanted: comparing a command string typed by a user with all possible
commands a program can accept, to be able to do typo-correction. The
method will return a floating point number between 0.0 (no match) and
len(s)+1.0 (if the strings are exactly equal). You can see the score as
"the number of equal letters".
I put the module at http://starship.python.net/crew/hooft/
It is called comparestrings.py.
It is a comparison algorithm as implemented by Reinhard Schneider and
Chris Sander for comparison of protein sequences, but implemented to
compare two ASCII strings.
The comparison makes use of a similarity matrix for letters: in the
protein case this is normally a chemical functionality similarity, for
our case this is a matrix based on keys next to each other on a US
Qwerty keyboard and on "capital letters are similar to their lowercase
equivalent"
The algorithm does not cut corners: it is sure to find the absolute best
match between the two strings.
No attempt has been made to make this efficient in time or memory. Time
taken and memory used is proportional to the product of the lengths of
the input strings. Use for strings longer than 25 characters is entirely
for your own risk.
Regards,
Rob Hooft
More information about the Python-list
mailing list