Ensuring symmetry in difflib.SequenceMatcher

John Yeung gallium.arsenide at gmail.com
Wed Nov 24 02:45:18 EST 2010


I'm generally pleased with difflib.SequenceMatcher:  It's probably not
the best available string matcher out there, but it's in the standard
library and I've seen worse in the wild.  One thing that kind of
bothers me is that it's sensitive to which argument you pick as "seq1"
and which you pick as "seq2":

Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import difflib
>>> difflib.SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
0.44444444444444442
>>> difflib.SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
0.66666666666666663
>>>

Is this a bug?  I am guessing the algorithm is implemented correctly,
and that it's just an inherent property of the algorithm used.  It's
certainly not what I'd call a desirably property.  Are there any
simple adjustments that can be made without sacrificing (too much)
performance?

John



More information about the Python-list mailing list