Comparing 2 similar strings?

Rob W. W. Hooft news at rob.hooft.net
Sat May 28 16:51:59 EDT 2005


After reading this thread, I have wrapped up a different approach, 
probably not what you were looking for, but it is very good for what I 
wanted: comparing a command string typed by a user with all possible 
commands a program can accept, to be able to do typo-correction. The 
method will return a floating point number between 0.0 (no match) and 
len(s)+1.0 (if the strings are exactly equal). You can see the score as 
"the number of equal letters".

I put the module at http://starship.python.net/crew/hooft/
It is called comparestrings.py.

It is a comparison algorithm as implemented by Reinhard Schneider and 
Chris Sander for comparison of protein sequences, but implemented to 
compare two ASCII strings.

The comparison makes use of a similarity matrix for letters: in the 
protein case this is normally a chemical functionality similarity, for 
our case this is a matrix based on keys next to each other on a US 
Qwerty keyboard and on "capital letters are similar to their lowercase 
equivalent"

The algorithm does not cut corners: it is sure to find the absolute best 
match between the two strings.

No attempt has been made to make this efficient in time or memory. Time 
taken and memory used is proportional to the product of the lengths of 
the input strings. Use for strings longer than 25 characters is entirely 
for your own risk.

Regards,

Rob Hooft



More information about the Python-list mailing list