How fuzzy is get_close_matches() in difflib?

Antoon Pardon apardon at forel.vub.ac.be
Fri Nov 17 10:21:33 EST 2006


On 2006-11-17, John Henry <john106henry at hotmail.com> wrote:
> I encountered a case where I am trying to match "HIDESST1" and
> "HIDESCT1" against ["HIDEDST1", "HIDEDCT1", "HIDEDCT2", "HIDEDCT3"]
>
> Well, they both hit "HIDEDST1" as the first match which is not exactly
> the result I was looking for.  I don't understand why "HIDESCT1" would
> not hit "HIDEDCT1" as a first choice.

    H I D E D S T 1     H I D E D C T 1

 H  .                   .
 I    .                   .
 D      .                   .
 E        .                   .
 S            .             
 C                                .
 T              .                   .
 1                .                   .

As far as I can see the distance of HIDEDCT1 to HIDESCT1 is
the same as the distance of HIDEDCT1 to HIDEDST1. In both
cases you have to remove one character from the target as well
as one character from the candidate in order to get the
same substring.

-- 
Antoon Pardon



More information about the Python-list mailing list