[Tutor] script still too slow

Jeff Shannon jeff@ccvcorp.com
Thu Feb 27 14:40:01 2003


Paul Tremblay wrote:

>After re-writing parts of my script for 8 hours yesterday, it is still
>almost 3 times slower than its perl counterpart. 
>[...]
>    def evaluate_token(self,token):
>        """Evaluate tokens. Return a value if the token is not a 
>        control word. Otherwise, pass token onto another method
>        for further evaluation."""
>        if token == '{':
>            self.__bracket_count = self.__bracket_count + 1
>            num = '%04d' % self.__bracket_count
>            token = 'ob<nu<nu<nu<%(num)s<{\n' % vars()
>        elif token == '}':
>            num = '%04d' % self.__bracket_count
>            token = 'cb<nu<nu<nu<%(num)s<}\n' % vars()
>            self.__bracket_count = self.__bracket_count - 1
>        elif token == r'\{':
>            token = 'tx<es<nu<nu<nu<{\n'
>        elif token == r'\}':
>            token = 'tx<es<nu<nu<nu<}\n'
>        elif token == r'\\': # double or escaped \
>            token = 'tx<es<nu<nu<nu<\\\n'
>        elif token[0:1] != '\\': # single \
>            token = 'tx<nu<nu<nu<nu<%(token)s\n' % vars()
>        else:
>            token = self.evaluate_cw(token)
>
>        return token
>
>This function takes around 17 seconds to run. The tokens have to be
>further evaluated in many cases with two other functions. Each of these
>functions takes 17 seconds as well.
>

This function, to my mind, screams out to be made into a dictionary 
lookup.  (Any time I'm dealing with a situation of "pick one option from 
this list of possibilities", or what in C/C++ would be a switch/case, I 
immediately think dictionary.)  If you put each of the given options 
into a small one- to three-line function and put all of those functions 
into a dictionary, you can then change this function into something like 
the following:

    def evaluate_token(self, token):
        function = self.token_dict.get(token, self.evaluate_cw)
        return function(self, token)  # passing in  self so that we can 
access self.__bracket_count

Or, almost equivalently but sparing the helper functions from needing to 
know so much about the bracket count:

    def evaluate_token(self, token):
        function = self.token_dict.get(token, self.evaluate_cw)
        token, self.__bracket_count = function(token, self.__bracket_count)
        return token

This will have the advantage of saving you quite a few comparisons -- 
just how much savings it gives will depend on the proportion of tokens 
that make it to the 'else' clause above, but since I'd imagine that this 
will be the majority of them, you're trading trading seven comparisons 
for a dict lookup and (in some small proportion of cases) an extra 
function call.  Since I presume that you're calling this function (at 
least) once for every token in the file, those savings will add up.

I also suspect that using the "formatstring % vars()" trick that you use 
is slower than simply making use of the actual variables in question. 
 Try swapping, say, "token = 'tx<nu<nu<nu<nu<%(token)s\n' % vars()" with 
"token = 'tx<nu<nu<nu<nu<%s\n' % token", and see if you get a speed 
increase.  (You're saving at least a function call each time, and 
possibly a dictionary construction as well, depending on how vars() 
operates.)  I wouldn't expect a large increase but it should help.  I 
personally dislike the "formatstring % vars()" trick in general, anyhow, 
because it feels like it *is* a trick, and that means that it's not 
clear and explicit.  I'd rather do the extra typing, and since you're 
only replacing one variable at a time anyhow, you're not really gaining 
anything by using vars() instead of explicitly naming variables.  I'm 
particularly puzzled by these bits:

            num = '%04d' % self.__bracket_count
            token = 'ob<nu<nu<nu<%(num)s<{\n' % vars()

which could quite easily be replaced by:

            token = 'ob<nu<nu<nu<%04d<{\n'% self.__bracket_count

You've introduced a completely spurious intermediary step, and then used a (potentially slow) trick to access that intermediate result.


My guess is that there's lots of this sort of thing in your program -- 
small inefficiencies that add up.

>I am beginning to think that python is just much slower than perl, and
>shouldn't be used for this task?
>

Not at all.  You just aren't used to thinking about how to write 
efficient Python.  For most tasks, pythonic Python code will be fairly 
equivalent in speed to perlish Perl code.  I suspect that if I were to 
start learning Perl and tried to convert a sizeable Python program to 
Perl, my port would be slower than the original, too, but that's just 
because I'm unfamiliar with good Perl practices, not because Perl is 
slower.  With practice, your Python will become more efficient and 
you'll be able to find lots of ways to optimize your code.

>Sorry I am vague on asking specifics, but I can't really pin point the
>exact problem with speed, and obviously I can't dump hundreds of lines
>on the mailing list.
>

One thing that others have done in the past, is to post their code on 
the web and give a link to it in the mailing list.  That way people who 
are interested in helping out can investigate and make suggestings.

Jeff Shannon
Technician/Programmer
Credit International