Performance issue

Sat Apr 2 16:59:54 EST 2005

On Sat, 02 Apr 2005 10:29:19 -0800, Shalabh Chaturvedi <shalabh at cafepy.com> wrote:

>Tom Carrick wrote:
>> Hi,
>> 
>> In my attempted learning of python, I've decided to recode an old
>> anagram solving program I made in C++. The C++ version runs in less
>> than a second, while the python takes 30 seconds. I'm not willing to
>> think it's just python being slow, so I was hoping someone could find
>> a faster way of doing this. Also, I was wondering if there was a more
>> builtin, or just nicer way of converting a string to a list (or using
>> the sort function on a list) than making a function for it.
>
><code snipped>
>
>Others have already given a good commentary and alternate suggestions. 
>Here is some more (and some disagreements):
>
>* Know your data structures (and how long different operations take). 
>Like don't do del L[0] unless required. This generally comes from 
>experience (and asking on this list).
>
>* list(any_sequence_here) will build a list from the sequence. There are 
>usually easy ways of converting built-in types - the Python docs will 
>help you here.
>
>* Try to write code as close to an english description of the problem as 
>possible. For example 'for word in words:' rather than using counters 
>and []. This is usually faster, clearer and IMO an important ingredient 
>of being 'Pythonic'.
>
>Anyway here's my rendition of your program:
>
>###
>anagram = raw_input("Find anagrams of word: ")
>lanagram = list(anagram)
>lanagram.sort()
>sorted_anagram = ''.join(lanagram).lower()
>
>f = open('/usr/share/dict/words', 'r')
>
>found = []
>
>for word in f:
>     word = word.strip('\n')
>     if len(word)==len(sorted_anagram):
>	sorted_word = list(word)
>	sorted_word.sort()
>	sorted_word = ''.join(sorted_word)
>         if sorted_word == sorted_anagram:
>	    found.append(word)
>
>print "Anagrams of %s:" % anagram
>
>for word in found:
>     print word
>###
>
>Hopefully it is fast enough.
>
If doing more than one anagram, I think making a set of anagram
dictionaries (one for each word length) and pickling them in
separate files for later re-use might help.

E.g., (untested!!) to make a dict (by wordlength) of anagram dicts, something like

d = {}
for word in open('/usr/share/dict/words'):
    word = word.strip().lower()
    d.setdefault(len(word), {}).setdefault(''.join(sorted(word)), []).append(word)))

Then
for wlen, adic in d.items():
    pickle_file_name = 'anagram_%s'% wlen
    # pickle adic and write it out to the file
    ...

Then the anagram utility can look for the appropriate pickle file per word length,
(i.e., 'anagram_%s'%len(word.strip())) and just load it to anadict and
print anadict(''.join(sorted(word.strip().lower())).

That's a sketch. Untested!! Gotta go ;-/

Regards,
Bengt Richter