need some kind of "coherence index" for a group of strings

Thu Nov 3 12:34:14 EDT 2016

On Thu, Nov 3, 2016 at 9:18 AM, Fillmore <fillmore_remove at hotmail.com>
wrote:

>
> Hi there, apologies for the generic question. Here is my problem let's say
> that I have a list of lists of strings.
>
> list1:    #strings are sort of similar to one another
>
>   my_nice_string_blabla
>   my_nice_string_blqbli
>   my_nice_string_bl0bla
>   my_nice_string_aru
>
>
> list2:    #strings are mostly different from one another
>
>   my_nice_string_blabla
>   some_other_string
>   yet_another_unrelated string
>   wow_totally_different_from_others_too
>
>
> I would like an algorithm that can look at the strings and determine that
> strings in list1 are sort of similar to one another, while the strings in
> list2 are all different.
> Ideally, it would be nice to have some kind of 'coherence index' that I
> can exploit to separate lists given a certain threshold.
>
> I was about to concoct something using levensthein distance, but then I
> figured that it would be expensive to compute and I may be reinventing the
> wheel.
>
> Thanks in advance to python masters that may have suggestions...
>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

When you say similar, do you mean similar in the amount of duplicate
words/letters? Or were you more interested
in similar sentence structure?