Percentage matching of text
Bruce Eckel
BruceEckel at MailBlocks.com
Fri Jul 30 14:32:14 EDT 2004
Ah, but here's an interesting one:
>>> sm(None, 'abcd', 'abdc').ratio()
0.75
>>> sm(None, 'abcd', 'abxx').ratio()
0.5
So if it matches half the string it's 50% but if the last two
characters are out of order that's an additional 25%.
Other examples:
>>> sm(None, 'abcd10', 'abdc20').quick_ratio()
0.83333333333333337
>>> sm(None, 'abcd10', 'abdc20').ratio()
0.66666666666666663
>>> sm(None, 'abcd10', 'abdc20').real_quick_ratio()
1.0
You get a different interpretation for each "speed" of ratio.
I started thinking that all I wanted was a pass-fail kind of thing so
I wondered if real_quick_ratio() might do the trick. But with the
following experiments:
>>> sm(None, 'abcd10', 'abdc20').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'abxx20').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'abxx24').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'anxx24').real_quick_ratio()
1.0
>>> sm(None, 'abcd10', 'qnxx24').real_quick_ratio()
1.0
It seems like there's no way to get real_quick_ratio() to say anything
except "it's a perfect match!" I'm wondering if someone didn't leave a
code stub unwritten:
def real_quick_ratio(self): return 1.0
Friday, July 30, 2004, 11:01:28 AM, you wrote:
> [Bruce Eckel]
> ...
>> What I'd like to do is find an algorithm that produces the results of
>> a text comparison as a percentage-match. Thus I would be able to
>> assert that my test samples must match the control sample by at least
>> (for example) 83% for the test to pass.
>>>> from difflib import SequenceMatcher as sm
>>>> sm(None, 'abc', 'xyz').ratio()
>>>> sm(None, 'abcd', 'abcd').ratio()
> 1.0
>>>> sm(None, 'abcd', 'uvwx').ratio()
> 0.0
>>>> sm(None, 'abcd', 'axyd').ratio()
> 0.5
>>>>
> SequenceMatcher works on sequences of hashable elements. Above, it's
> working on sequence of characters (aka "strings" <wink>). Other
> possibilites include sequences of lines ("files") and lists of
> integers.
Bruce Eckel http://www.BruceEckel.com mailto:BruceEckel at MailBlocks.com
Contains electronic books: "Thinking in Java 3e" & "Thinking in C++ 2e"
Web log: http://www.mindview.net/WebLog
Subscribe to my newsletter:
http://www.mindview.net/Newsletter
My schedule can be found at:
http://www.mindview.net/Calendar
"The whole problem with the world is that fools and fanatics are always
so certain of themselves, and wiser people so full of doubts."
--Bertrand Russell
More information about the Python-list
mailing list