Algorithm used by difflib.get_close_match

Wojtek Walczak gminick at bzt.bzt
Tue Sep 2 09:33:08 EDT 2008


On Tue, 2 Sep 2008 06:17:37 -0700 (PDT), Guillermo wrote:

> Does anyone know whether this function uses edit distance? If not,
> which algorithm is it using?

The following passage comes from difflib.py:

SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable.  The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the
hyperbolic name "gestalt pattern matching".  The basic idea is to find
the longest contiguous matching subsequence that contains no "junk"
elements (R-O doesn't address junk).  The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence.  This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to
people.

HTH.

-- 
Regards,
Wojtek Walczak,
http://tosh.pl/gminick/



More information about the Python-list mailing list