[Tutor] script too slow
Alfred Milgrom
fredm@smartypantsco.com
Sun Feb 23 19:03:02 2003
At 12:57 PM 23/02/03 -0500, Paul Tremblay wrote:
>My script is running so slow that it might prove usesless, and I
>wondered if there was a way to speed it up.
>
>The script will convert Microsoft RTF to XML. I had originally written
>the script in perl, and now am converting it to python.
>
>I have completed two parts of the script. The first part uses regular
>expressions to break each line into tokens.
>
>perl => 20 seconds
>python => 45 seconds
>
>Not surprisingly, python ran slower than perl, which is designed around
>regular expressions. However, the next part proved very disappointing to
>me. This part reads each token, and determines if it is in a dictionary,
>and takes action if it is.
<snip>
> # now use the dictionaries to process the tokens
> def process_cw(self, token, str_token, space):
> """Change the value of the control word by determing what dictionary
> it belongs to"""
>
> if token_changed == '*':
> pass
> elif self.needed_bool.has_key(token_changed):
> token_changed = self.needed_bool[token_changed]
> elif self.styles_1.has_key(token_changed):
> token_changed = self.styles_1[token_changed]
> elif self.styles_2.has_key(token_changed):
> token_changed = self.styles_2[token_changed]
> num = self.divide_num(num,2)
>
> # ect. There are around a dozen such statements
>
>
>It is this last function, the "def process_cw", that eats up all the
>clock time. If I skip over this function, then I chop around 30 seconds
>off the script.
>
>The dictionary part of the scrpt seems so slow, that I am guessing I am
>doing something wroing, that Python has to read in the dictionary each
>time it starts the function.
I am not sure if I am on the right track or not, but as I understand it
each element is only in one dictionary, but there are some different things
that need to be done depending on which dictionary it is in (such as
changing num).
Have you thought of having a "super dictionary" which lists where each
element is? The super-dictionary can be created by the program so that you
don't have to worry about updating the dictionaries and the
super-dictionary. Something along the lines of:
dict1={'a':1, 'b':2, 'c':3}
dict2={'d':4, 'e':5, 'f':6}
superdict = {}
dictlist=[dict1, dict2]
for dictionary in dictlist:
for key in dictionary:
superdict[key] = dictionary
items = 'adbecf'
result = []
for letter in items:
result.append(superdict[letter][letter])
print result
>>> [1, 4, 2, 5, 3, 6]
If some of the elements are present in more than one dictionary, the order
of the dictionaries in creating the super-dictionary becomes important.
However, I am surprised that dictionary lookups are taking the time. Is
there anything else in that function that may be eating up the time?
HTH,
Fred Milgrom