Performance issue

Sat Apr 2 11:15:28 EST 2005

Tom Carrick wrote:
> Hi,
> 
> In my attempted learning of python, I've decided to recode an old
> anagram solving program I made in C++. The C++ version runs in less
> than a second, while the python takes 30 seconds. I'm not willing to
> think it's just python being slow, so I was hoping someone could find
> a faster way of doing this.

I like your attitude, not thinking that it's just Python that is slow :-)

> Also, I was wondering if there was a more
> builtin, or just nicer way of converting a string to a list (or using
> the sort function on a list) than making a function for it.

String to list:   list("irmen")  # --> ['i','r','m','e','n']
Sorted list:      sorted("irmen")   # --> ['e', 'i', 'm', 'n', 'r']
(the latter works in Python 2.4+)

> 
> The words.txt here is just a copy of FreeBSD's /usr/share/dict/words
> 
> Anyway, the code:
> 
> import string
> 
> # Need a function to convert a string to a list to be
> # able to use the sort() function
> def string2list(s):
>     l = []
>     for i in range(0, len(s)):
>         l.append(s[i])
>     return l

... see above... just replace string2list(s) with sorted(s)

> 
> words = []
> found = []
> 
> anagram = raw_input("Find anagrams of word: ")
> 
> f = open('words.txt', 'r')
> file = f.read()
> f.close()

Style: don't use 'file' as a variable name, you're hiding the
builtin 'file' function

> 
> words = file.splitlines()

You can obtain this list without reading the file in its entirety,
by using the readlines method of file objects:

words=open("words.txt").readlines()

> 
> sorted_anagram = anagram.lower()
> sorted_anagram = string2list(anagram)
> sorted_anagram.sort(lambda x, y: cmp(x, y))

The lambda is optional and only slows it down :-)
But to get a sorted list of letters, just use sorted(s)
if you're on Python 2.4+

> while words:
>     if len(words[0]) == len(sorted_anagram):
>         wordlist = string2list(words[0])
>         wordlist.sort(lambda x, y: cmp(x, y))
>         sorted_wordlist = wordlist

(same here.. replacing this by sorted(words[0]) probably
will speed it up rather significantly, partly because
it avoids the creation of those temporary lists)

>         if sorted_anagram == sorted_wordlist:
>             found.append(words[0])
>     del words[0]
> 
> print "Anagrams of " + anagram + ": "
> while found:
>     print found[0] + " "
>     del found[0]

print " ".join(found)

Cheers

--Irmen de Jong