How fuzzy is get_close_matches() in difflib?

John Henry john106henry at hotmail.com
Fri Nov 17 13:44:02 EST 2006


I suppose you are right.  I guess I ended up with an odd case.

I was thinking that:

To change "HIDE*S*ST1" to "HIDE*D*ST1", all you do is remove the "*S*"
from the source and the "*D*" from the target.

In order to change "HIDE*SC*T1" to "HIDE*DS*T1", I thought you have to
remove 2 characters *SC* from the source.   Then I realize that it's
not true.  If you remove the "C" from the source, and the "D" from the
*DS* of the destination, it's a match (!)

So, yes, they have the same distance!


Antoon Pardon wrote:
> On 2006-11-17, John Henry <john106henry at hotmail.com> wrote:
> > I encountered a case where I am trying to match "HIDESST1" and
> > "HIDESCT1" against ["HIDEDST1", "HIDEDCT1", "HIDEDCT2", "HIDEDCT3"]
> >
> > Well, they both hit "HIDEDST1" as the first match which is not exactly
> > the result I was looking for.  I don't understand why "HIDESCT1" would
> > not hit "HIDEDCT1" as a first choice.
>
>     H I D E D S T 1     H I D E D C T 1
>
>  H  .                   .
>  I    .                   .
>  D      .                   .
>  E        .                   .
>  S            .
>  C                                .
>  T              .                   .
>  1                .                   .
>
> As far as I can see the distance of HIDEDCT1 to HIDESCT1 is
> the same as the distance of HIDEDCT1 to HIDEDST1. In both
> cases you have to remove one character from the target as well
> as one character from the candidate in order to get the
> same substring.
> 
> -- 
> Antoon Pardon




More information about the Python-list mailing list