need some kind of "coherence index" for a group of strings

Neil D. Cerutti neilc at norwich.edu
Thu Nov 3 16:08:34 EDT 2016


On 11/3/2016 1:49 PM, jladasky at itu.edu wrote:
> The Levenshtein distance is a very precise definition of dissimilarity between sequences.  It specifies the minimum number of single-element edits you would need to change one sequence into another.  You are right that it is fairly expensive to compute.
>
> But you asked for an algorithm that would determine whether groups of strings are "sort of similar".  How imprecise can you be?  An analysis of the frequency of each individual character in a string might be good enough for you.

I also once used a Levenshtein distance algo in Python (code snippet 
D0DE4716-B6E6-4161-9219-2903BF8F547F) to compare names of students (it 
worked, but turned out to not be what I needed), but you may also be 
able to use some items "off the shelf" from Python's difflib.

-- 
Neil Cerutti




More information about the Python-list mailing list