Percentage matching of text

Tim Peters tim.peters at gmail.com
Fri Jul 30 13:01:28 EDT 2004


[Bruce Eckel]
...
> What I'd like to do is find an algorithm that produces the results of
> a text comparison as a percentage-match. Thus I would be able to
> assert that my test samples must match the control sample by at least
> (for example) 83% for the test to pass.

>>> from difflib import SequenceMatcher as sm
>>> sm(None, 'abc', 'xyz').ratio()
>>> sm(None, 'abcd', 'abcd').ratio()
1.0
>>> sm(None, 'abcd', 'uvwx').ratio()
0.0
>>> sm(None, 'abcd', 'axyd').ratio()
0.5
>>>

SequenceMatcher works on sequences of hashable elements.  Above, it's
working on sequence of characters (aka "strings" <wink>).  Other
possibilites include sequences of lines ("files") and lists of
integers.



More information about the Python-list mailing list