need some kind of "coherence index" for a group of strings

Mario R. Osorio nimbiotics at gmail.com
Thu Nov 3 22:37:02 EDT 2016


I don't know much about these topics but, wouldn't soundex do the job??

 On Thursday, November 3, 2016 at 12:18:19 PM UTC-4, Fillmore wrote:
> Hi there, apologies for the generic question. Here is my problem let's 
> say that I have a list of lists of strings.
> 
> list1:    #strings are sort of similar to one another
> 
>    my_nice_string_blabla
>    my_nice_string_blqbli
>    my_nice_string_bl0bla
>    my_nice_string_aru
> 
> 
> list2:    #strings are mostly different from one another
> 
>    my_nice_string_blabla
>    some_other_string
>    yet_another_unrelated string
>    wow_totally_different_from_others_too
> 
> 
> I would like an algorithm that can look at the strings and determine 
> that strings in list1 are sort of similar to one another, while the 
> strings in list2 are all different.
> Ideally, it would be nice to have some kind of 'coherence index' that I 
> can exploit to separate lists given a certain threshold.
> 
> I was about to concoct something using levensthein distance, but then I 
> figured that it would be expensive to compute and I may be reinventing 
> the wheel.
> 
> Thanks in advance to python masters that may have suggestions...




More information about the Python-list mailing list