need some kind of "coherence index" for a group of strings

Fillmore fillmore_remove at hotmail.com
Thu Nov 3 12:18:06 EDT 2016


Hi there, apologies for the generic question. Here is my problem let's 
say that I have a list of lists of strings.

list1:    #strings are sort of similar to one another

   my_nice_string_blabla
   my_nice_string_blqbli
   my_nice_string_bl0bla
   my_nice_string_aru


list2:    #strings are mostly different from one another

   my_nice_string_blabla
   some_other_string
   yet_another_unrelated string
   wow_totally_different_from_others_too


I would like an algorithm that can look at the strings and determine 
that strings in list1 are sort of similar to one another, while the 
strings in list2 are all different.
Ideally, it would be nice to have some kind of 'coherence index' that I 
can exploit to separate lists given a certain threshold.

I was about to concoct something using levensthein distance, but then I 
figured that it would be expensive to compute and I may be reinventing 
the wheel.

Thanks in advance to python masters that may have suggestions...






More information about the Python-list mailing list