Percentage matching of text

Bruce Eckel BruceEckel at MailBlocks.com
Fri Jul 30 14:32:14 EDT 2004


Ah, but here's an interesting one:

>>> sm(None, 'abcd', 'abdc').ratio()
0.75

>>> sm(None, 'abcd', 'abxx').ratio()
0.5

So if it matches half the string it's 50% but if the last two
characters are out of order that's an additional 25%.

Other examples:

>>> sm(None, 'abcd10', 'abdc20').quick_ratio()
0.83333333333333337
>>> sm(None, 'abcd10', 'abdc20').ratio()
0.66666666666666663
>>> sm(None, 'abcd10', 'abdc20').real_quick_ratio()
1.0

You get a different interpretation for each "speed" of ratio.

I started thinking that all I wanted was a pass-fail kind of thing so
I wondered if real_quick_ratio() might do the trick. But with the
following experiments:

>>> sm(None, 'abcd10', 'abdc20').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'abxx20').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'abxx24').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'anxx24').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'qnxx24').real_quick_ratio()
1.0

It seems like there's no way to get real_quick_ratio() to say anything
except "it's a perfect match!" I'm wondering if someone didn't leave a
code stub unwritten:

def real_quick_ratio(self): return 1.0



Friday, July 30, 2004, 11:01:28 AM, you wrote:

> [Bruce Eckel]
> ...
>> What I'd like to do is find an algorithm that produces the results of
>> a text comparison as a percentage-match. Thus I would be able to
>> assert that my test samples must match the control sample by at least
>> (for example) 83% for the test to pass.

>>>> from difflib import SequenceMatcher as sm
>>>> sm(None, 'abc', 'xyz').ratio()
>>>> sm(None, 'abcd', 'abcd').ratio()
> 1.0
>>>> sm(None, 'abcd', 'uvwx').ratio()
> 0.0
>>>> sm(None, 'abcd', 'axyd').ratio()
> 0.5
>>>>

> SequenceMatcher works on sequences of hashable elements.  Above, it's
> working on sequence of characters (aka "strings" <wink>).  Other
> possibilites include sequences of lines ("files") and lists of
> integers.


Bruce Eckel    http://www.BruceEckel.com   mailto:BruceEckel at MailBlocks.com
Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e"
Web log: http://www.mindview.net/WebLog
Subscribe to my newsletter:
http://www.mindview.net/Newsletter
My schedule can be found at:
http://www.mindview.net/Calendar

"The whole problem with the world is that fools and fanatics are always
so certain of themselves, and wiser people so full of doubts."
  --Bertrand Russell





More information about the Python-list mailing list