newb: comapring two strings

Serge Orlov Serge.Orlov at gmail.com
Thu May 18 21:44:31 EDT 2006


manstey wrote:
> Hi,
>
> Is there a clever way to see if two strings of the same length vary by
> only one character, and what the character is in both strings.
>
> E.g. str1=yaqtil str2=yaqtel
>
> they differ at str1[4] and the difference is ('i','e')
>
> But if there was str1=yiqtol and str2=yaqtel, I am not interested.
>
> can anyone suggest a simple way to do this?
>
> My next problem is, I have a list of 300,000+ words and I want to find
> every pair of such strings. I thought I would first sort on length of
> string, but how do I iterate through the following:
>
> str1
> str2
> str3
> str4
> str5
>
> so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,
> str2 & str3, str3 & str4, str3 & str5, str4 & str5.

If your strings are pretty short you can do it like this even without
sorting by length first:

def fuzzy_keys(s):
    for pos in range(len(s)):
        yield s[0:pos]+chr(0)+s[pos+1:]

def fuzzy_insert(d, s):
    for fuzzy_key in fuzzy_keys(s):
        if fuzzy_key in d:
            strings = d[fuzzy_key]
            if type(strings) is list:
                strings += s
            else:
                d[fuzzy_key] = [strings, s]
        else:
            d[fuzzy_key] = s

def gather_fuzzy_matches(d):
    for strings in d.itervalues():
        if type(strings) is list:
            yield strings

acc = {}
fuzzy_insert(acc, "yaqtel")
fuzzy_insert(acc, "yaqtil")
fuzzy_insert(acc, "oaqtil")
print list(gather_fuzzy_matches(acc))

prints

[['yaqtil', 'oaqtil'], ['yaqtel', 'yaqtil']]




More information about the Python-list mailing list