need some kind of "coherence index" for a group of strings

duncan smith duncan at invalid.invalid
Fri Nov 4 10:07:07 EDT 2016


On 03/11/16 16:18, Fillmore wrote:
> 
> Hi there, apologies for the generic question. Here is my problem let's
> say that I have a list of lists of strings.
> 
> list1:    #strings are sort of similar to one another
> 
>   my_nice_string_blabla
>   my_nice_string_blqbli
>   my_nice_string_bl0bla
>   my_nice_string_aru
> 
> 
> list2:    #strings are mostly different from one another
> 
>   my_nice_string_blabla
>   some_other_string
>   yet_another_unrelated string
>   wow_totally_different_from_others_too
> 
> 
> I would like an algorithm that can look at the strings and determine
> that strings in list1 are sort of similar to one another, while the
> strings in list2 are all different.
> Ideally, it would be nice to have some kind of 'coherence index' that I
> can exploit to separate lists given a certain threshold.
> 
> I was about to concoct something using levensthein distance, but then I
> figured that it would be expensive to compute and I may be reinventing
> the wheel.
> 
> Thanks in advance to python masters that may have suggestions...
> 
> 
> 

https://pypi.python.org/pypi/jellyfish/

Duncan



More information about the Python-list mailing list