comapring two strings

Fri May 19 01:18:23 EDT 2006

"Paul McGuire" <ptmcg at austin.rr._bogus_.com> writes:
> >>> def offByNoMoreThanOneCharacter(a,b):
> ...     return len(a)==len(b) and sum(map(lambda (x,y): x==y, zip(a,b))) >=
> len(a)-1

Yikes!  How about (untested):

  def offByNoMoreThanOneCharacter(a,b):
    return len(a)==len(b) and \
     len([i for i in xrange(len(a)) if a[i] != b[i]]) <= 1

which seems more direct.  I'm not sure it's correct since I'd
consider "apple" and "apples" to be one character apart even though
their lengths are unequal.

But I think this type of function is useless for the problem at hand,
since there are around N=300,000 words, comparing every pair of them
is O(N**2) which is a bit painful.

I'd use a decorate-sort-undecorate approach like:

   # generate all the ways to 
   def generate_all_variants(word):
      wlower = word.lower()
      yield (wlower, word)
      for i in xrange(len(word)):
         yield (wlower[:i] + wlower[i+1:], word)

Use this to generate all the variants of all the words in the
dictionary, and write those out into a file, each line containing a
variant plus the original word.  Then use a sorting utility like the
unix "sort" program to sort the file.  Those programs work efficiently
even if the file is too large to fit in memory.  Then read the sorted
file to find equivalence classes on the variants.