Performance issue
Bengt Richter
bokr at oz.net
Sat Apr 2 16:59:54 EST 2005
On Sat, 02 Apr 2005 10:29:19 -0800, Shalabh Chaturvedi <shalabh at cafepy.com> wrote:
>Tom Carrick wrote:
>> Hi,
>>
>> In my attempted learning of python, I've decided to recode an old
>> anagram solving program I made in C++. The C++ version runs in less
>> than a second, while the python takes 30 seconds. I'm not willing to
>> think it's just python being slow, so I was hoping someone could find
>> a faster way of doing this. Also, I was wondering if there was a more
>> builtin, or just nicer way of converting a string to a list (or using
>> the sort function on a list) than making a function for it.
>
><code snipped>
>
>Others have already given a good commentary and alternate suggestions.
>Here is some more (and some disagreements):
>
>* Know your data structures (and how long different operations take).
>Like don't do del L[0] unless required. This generally comes from
>experience (and asking on this list).
>
>* list(any_sequence_here) will build a list from the sequence. There are
>usually easy ways of converting built-in types - the Python docs will
>help you here.
>
>* Try to write code as close to an english description of the problem as
>possible. For example 'for word in words:' rather than using counters
>and []. This is usually faster, clearer and IMO an important ingredient
>of being 'Pythonic'.
>
>Anyway here's my rendition of your program:
>
>###
>anagram = raw_input("Find anagrams of word: ")
>lanagram = list(anagram)
>lanagram.sort()
>sorted_anagram = ''.join(lanagram).lower()
>
>f = open('/usr/share/dict/words', 'r')
>
>found = []
>
>for word in f:
> word = word.strip('\n')
> if len(word)==len(sorted_anagram):
> sorted_word = list(word)
> sorted_word.sort()
> sorted_word = ''.join(sorted_word)
> if sorted_word == sorted_anagram:
> found.append(word)
>
>print "Anagrams of %s:" % anagram
>
>for word in found:
> print word
>###
>
>Hopefully it is fast enough.
>
If doing more than one anagram, I think making a set of anagram
dictionaries (one for each word length) and pickling them in
separate files for later re-use might help.
E.g., (untested!!) to make a dict (by wordlength) of anagram dicts, something like
d = {}
for word in open('/usr/share/dict/words'):
word = word.strip().lower()
d.setdefault(len(word), {}).setdefault(''.join(sorted(word)), []).append(word)))
Then
for wlen, adic in d.items():
pickle_file_name = 'anagram_%s'% wlen
# pickle adic and write it out to the file
...
Then the anagram utility can look for the appropriate pickle file per word length,
(i.e., 'anagram_%s'%len(word.strip())) and just load it to anadict and
print anadict(''.join(sorted(word.strip().lower())).
That's a sketch. Untested!! Gotta go ;-/
Regards,
Bengt Richter
More information about the Python-list
mailing list