newb: comapring two strings

Fri May 19 14:08:34 EDT 2006

I think a way to solve the problem may be:
1) create a little Python script to separate the original words in many
files, each one containing only words of the same length. Every
filename can contain the relative word length.
2) write a little C program with two nested loops, that scans all the
pairs of the words of a single file, looking for single char
differences using a scanning of the chars. Probably there are many
possible tricks to speed up this search (like separating the words in
subgroups, of using some assembly-derived tricks, etc) but maybe you
don't need them. As C data structure you may simply use a single block
(because you know the length of a single word, so the files produced by
Python can be without spaces and returns).

I have suggested C because if the words are all of the same length then
you have 30000^2 = 90 000 000 000 pairs to test.
If you want, before creating the C program you can create a little
Python+Psyco prototype that uses array.array of chars (because
sometimes Psyco uses them quite quickly).

Bye,
bearophile