Faster way to do this?

Thu May 22 01:39:56 EDT 2003

On Wed, 21 May 2003 08:06:19 -0500, Skip Montanaro <skip at pobox.com> wrote:

>
>    >> Another, quite a bit faster method. Would generating a list of all
>    >> possible words, then looking for them as dictionary keys be any
>    >> faster?
>
>    Greg> This will work, but be careful if your string length gets very
>    Greg> long.  At 6 letters, you've got 720 permutations which would go
>    Greg> rather quickly and not each too much memory.  At 8 letters, it
>    Greg> jumps to 40,320 permutations and at 10, it's already above 3.5
>    Greg> million.
>
>In private email with Freddie (I think he got confused by my cc to him in my
>original post) I sent him a slight modification of my original script
>(appended below).  It records all prefixes of the words it places in the
>dictionary, greatly increasing the startup time (from about 1 second to
>about 10 seconds), but also pruning the permutation tree quite a bit.
>Runtime looks like this on my Powerbook:
>
>    % python findwords.py somnambulistexplos
>    loaded 832166 words into dictionary in 11.02 seconds
>    found 3749 words in 10.43 seconds
>
>Warning: I wrote this script around 2am.  I'm sure it could use further
>improvement, and while I believe it accurately computed the hidden words
>correctly, I'm not motivated to exhaustively check each of the 3749 words it
>found in "somnambulistexplos". ;-)
>
[... snip nice code ...]

Here's a longer bit of code. It's a different algorithm, so I'm curious how
it would run with your word list (word per line is white-space-delimited so
it should work with this code) and on your machine (presumably faster than
my old 300mhz P2, though I have 320mb RAM). BTW, should we normalize CPU-bound
stuff by bogomips?

BTW, you can import this and search for words via findword2.test('someletters') and
it should reuse the dictionary after the first time. Or findword2.test('someletters', 1)
to pause before printing the found words.

====< findword2.py >======================================
# findwords2.py
# make special dict
chars2words=None
wordfile = r'C:\Info\Linguistics\moby\mwords\113809of.fic'

def makedict():
    global chars2words # global so you can control when this is done
    words = file(wordfile,'r').read().split()
    chars2words={}
    for w in words:
        ws = list(w); ws.sort(); ws=''.join(ws)
        chars2words.setdefault(ws,[]).append(w)
    print 'Dictionary has %s words in %s wordlists' % (len(words), len(chars2words))

def getwords(letters):
    found = {}
    charl = list(letters)
    charl.sort()
    charl = ''.join(charl).lower()
    nc = len(charl)
    charm = ['']*nc
    n = 0; count = 2**nc
    while n<count:
        carry = 1
        for i in xrange(nc):
            if not carry: break
            if charm[i]: charm[i] = ''
            else:
                charm[i] = charl[i]
                carry = 0
        wcharm = ''.join(charm)
        wordlist = chars2words.get(wcharm)
        if wordlist: found[wcharm] = '%15s: %s' %(wcharm, wordlist)
        n += 1

    out = found.values()
    out.sort()
    return '\n'.join(out)

def test(letters, pause=0):
    import time
    if not chars2words:
        start = time.clock()
        print 'Building chars2words dict from file "%s" ...' % wordfile
        makedict()    
        print '... took %6.3f seconds\n' % (time.clock()-start)
    print 'Searching for words ...'
    start = time.clock()
    result = getwords(letters)
    print '... took %6.3f seconds.\n' % (time.clock()-start)
    print 'Found %s words using one or more letters of "%s":\n' % (
                                result.count(':')+result.count(','), letters)
    if pause or __name__ == '__main__' and result.count(':')>20:
        raw_input('Press enter to continue with word list ')
    print result

if __name__ == '__main__':
    import sys
    try:
        if sys.argv[2:3]: wordfile = sys.argv[2]
        test(sys.argv[1])
    except Exception, e:
        print '%s: %s\n' % (e.__class__.__name__, e)
        print 'Usage: findwords2.py letter_seq [wordfile]'
==========================================================

Running with the moby scrabble word list, and somnambulistexplos

[22:37] C:\pywk\jumble>findwords2.py somnambulistexplos
Building chars2words dict from file "C:\Info\Linguistics\moby\mwords\113809of.fic" ...
Dictionary has 113809 words in 100097 wordlists
... took 10.350 seconds

Searching for words ...
... took 16.689 seconds.

Found 3953 words using one or more letters of "somnambulistexplos":

Press enter to continue with word list
(I'll leave that off ;-)

Regards,
Bengt Richter