need some kind of "coherence index" for a group of strings

jladasky at itu.edu jladasky at itu.edu
Thu Nov 3 13:49:00 EDT 2016


The Levenshtein distance is a very precise definition of dissimilarity between sequences.  It specifies the minimum number of single-element edits you would need to change one sequence into another.  You are right that it is fairly expensive to compute.

But you asked for an algorithm that would determine whether groups of strings are "sort of similar".  How imprecise can you be?  An analysis of the frequency of each individual character in a string might be good enough for you.



More information about the Python-list mailing list