Problem loading a file of words

Mon Jul 25 08:01:22 EDT 2005

On Sun, 24 Jul 2005 20:44:08 -0700, teoryn wrote:

> I've been spending today learning python and as an exercise I've ported
> a program I wrote in java that unscrambles a word. Before describing
> the problem, here's the code:
> 
> *--beginning of file--*
> #!/usr/bin/python
> # Filename: unscram.py
> 
> def sort_string(word):
>         '''Returns word in lowercase sorted alphabetically'''
>         word = str.lower(word)

It is generally considered better form to write that line as:

    word = word.lower()

>         word_list = []
>         for char in word:
>                 word_list.append(char)

If you want a list of characters, the best way of doing that is just:

    word_list = list(word)

>         word_list.sort()

>         sorted_word = ''
>         for char in word_list:
>                 sorted_word += char
>         return sorted_word

And the above four lines are best written as:

    return ''.join(word_list)

> print 'Building dictionary...',
> 
> dictionary = { }
> 
> # Notice that you need to have a file named 'dictionary.txt'
> # in the same directory as this file. The format is to have
> # one word per line, such as the following (of course without
> # the # marks):
> 
> #test
> #hello
> #quit
> #night
> #pear
> #pare
> 
> f = file('dictionary.txt')
> 
> # This loop builds the dictionary, where the key is
> # the string after calling sort_string(), and the value
> # is the list of all 'regular' words (from the dictionary,
> # not sorted) that passing to sort_string() returns the key
> 
> while True:
>         line = f.readline()
>         if len(line) == 0:
>                 break
>         line = str.lower(line[:-1]) # convert to lowercase just in case
> and
>                                     # remove the return at the end of
> the line
>         sline = sort_string(line)
>         if sline in dictionary:     # this key already exist, add to
> existing list
>                 dictionary[sline].append(line)
>                 print 'Added %s to key %s' % (line,sline) #for testing
>         else:                       # create new key and list
>                 dictionary[sline] = [line]
>                 print 'Created key %s for %s' % (sline,line) #for
> testing
> f.close()

Your while-loop seems to have been mangled a little thanks to word-wrap.
In particular, I can't work out what that "and" is doing in the middle of
it.

Unless you are expecting really HUGE dictionary files (hundreds of
millions of lines) perhaps a better way of writing the above while-loop
would be:

print 'Building dictionary...',
dictionary = { }
f = file('dictionary.txt', 'r')
for line in f.readlines()
    line = line.strip()  # remove whitespace at both ends
    if line:  # line is not the empty string
        line = line.lower()
        sline = sort_string(line)
        if sline in dictionary:
            dictionary[sline].append(line)
            print 'Added %s to key %s' % (line,sline)
        else:
            dictionary[sline] = [line]
            print 'Created key %s for %s' % (sline,line)
f.close()

> print 'Ready!'
> 
> # This loop lets the user input a scrambled word, look for it in
> # dictionary, and print all matching unscrambled words.
> # If the user types 'quit' then the program ends.
> while True:
>         lookup = raw_input('Enter a scrambled word : ')
> 
>         results = dictionary[sort_string(lookup)]

This will fail if the scrambled word you enter is not in the dictionary.

>         for x in results:
>                 print x,
> 
>         print
> 
>         if lookup == 'quit':
>                 break

You probably want the test for quit to happen before printing the
"unscrambled" words.

> *--end of file--*
> 
> 
> If you create dictionary.txt as suggested in the comments, it should
> work fine (assumeing you pass a word that creates a valid key, I'll
> have to add exceptions later). The problem is when using a large
> dictionary.txt file (2.9 MB is the size of the dictionary I tested) it
> always gives an error, specifically:
> (Note: ccehimnostyz is for zymotechnics, which is in the
> large dictionary)
> 
> 
> *--beginning of example--*
> Enter a scrambled word : ccehimnostyz Traceback (most recent call last):
>   File "unscram.py", line 62, in ?
>     results = dictionary[sort_string(lookup)]
> KeyError: 'ccehimnostyz'
> *--end of example--*

If this error is always happening for the LAST line in the text file, I'm
guessing there is no newline after the word. So when you read the text
file and build the dictionary, you inadvertently remove the "s" from the
word before storing it in the dictionary.

-- 
Steven.