[Tutor] script too slow

Sun Feb 23 19:03:02 2003

At 12:57 PM 23/02/03 -0500, Paul Tremblay wrote:
>My script is running so slow that it might prove usesless, and I
>wondered if there was a way to speed it up.
>
>The script will convert Microsoft RTF to XML. I had originally written
>the script in perl, and now am converting it to python.
>
>I have completed two parts of the script. The first part uses regular
>expressions to break each line into tokens.
>
>perl => 20 seconds
>python => 45 seconds
>
>Not surprisingly, python ran slower than perl, which is designed around
>regular expressions. However, the next part proved very disappointing to
>me. This part reads each token, and determines if it is in a dictionary,
>and takes action if it is.
<snip>

>     # now use the dictionaries to process the tokens
>     def process_cw(self, token, str_token, space):
>         """Change the value of the control word by determing what dictionary
>         it belongs to"""
>
>         if token_changed == '*':
>             pass
>         elif self.needed_bool.has_key(token_changed):
>             token_changed = self.needed_bool[token_changed]
>         elif self.styles_1.has_key(token_changed):
>             token_changed = self.styles_1[token_changed]
>         elif self.styles_2.has_key(token_changed):
>             token_changed = self.styles_2[token_changed]
>             num = self.divide_num(num,2)
>
>         # ect. There are around a dozen such statements
>
>
>It is this last function, the "def process_cw", that eats up all the
>clock time. If I skip over this function, then I chop around 30 seconds
>off the script.
>
>The dictionary part of the scrpt seems so slow, that I am guessing I am
>doing something wroing, that Python has to read in the dictionary each
>time it starts the function.

I am not sure if I am on the right track or not, but as I understand it 
each element is only in one dictionary, but there are some different things 
that need to be done depending on which dictionary it is in (such as 
changing num).

Have you thought of having a "super dictionary" which lists where each 
element is? The super-dictionary can be created by the program so that you 
don't have to worry about updating the dictionaries and the 
super-dictionary. Something along the lines of:

dict1={'a':1, 'b':2, 'c':3}
dict2={'d':4, 'e':5, 'f':6}

superdict = {}

dictlist=[dict1, dict2]
for dictionary in dictlist:
     for key in dictionary:
         superdict[key] = dictionary

items = 'adbecf'
result = []
for letter in items:
     result.append(superdict[letter][letter])

print result

 >>> [1, 4, 2, 5, 3, 6]

If some of the elements are present in more than one dictionary, the order 
of the dictionaries in creating the super-dictionary becomes important.

However, I am surprised that dictionary lookups are taking the time. Is 
there anything else in that function that may be eating up the time?

HTH,
Fred Milgrom