Method much slower than function?

Wed Jun 13 21:28:27 EDT 2007

On 2007-06-14, idoerg at gmail.com <idoerg at gmail.com> wrote:
> Hi all,
>
> I am running Python 2.5 on Feisty Ubuntu. I came across some code that
> is substantially slower when in a method than in a function.
>
> ################# START SOURCE #############
> # The function
>
> def readgenome(filehandle):
>   s = ''
>   for line in filehandle.xreadlines():
>       if '>' in line:
>           continue
>       s += line.strip()
>   return s
>
> # The method in a class
> class bar:
>   def readgenome(self, filehandle):
>       self.s = ''
>       for line in filehandle.xreadlines():
>           if '>' in line:
>               continue
>           self.s += line.strip()
>
> ################# END SOURCE ##############
> When running the function and the method on a 20,000 line text file, I
> get the following:
>
>>>> cProfile.run("bar.readgenome(open('cb_foo'))")
>          20004 function calls in 10.214 CPU seconds
>
>    Ordered by: standard name
>
>    ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>         1    0.000    0.000   10.214   10.214 <string>:1(<module>)
>         1   10.205   10.205   10.214   10.214 reader.py:11(readgenome)
>         1    0.000    0.000    0.000    0.000 {method 'disable' of
> '_lsprof.Profiler' objects}
>     19999    0.009    0.000    0.009    0.000 {method 'strip' of 'str'
> objects}
>         1    0.000    0.000    0.000    0.000 {method 'xreadlines' of
> 'file' objects}
>         1    0.000    0.000    0.000    0.000 {open}
>
>
>>>> cProfile.run("z=r.readgenome(open('cb_foo'))")
>          20004 function calls in 0.041 CPU seconds
>
>    Ordered by: standard name
>
>    ncalls  tottime  percall  cumtime  percall
> filename:lineno(function)
>         1    0.000    0.000    0.041    0.041 <string>:1(<module>)
>         1    0.035    0.035    0.041    0.041 reader.py:2(readgenome)
>         1    0.000    0.000    0.000    0.000 {method 'disable' of
> '_lsprof.Profiler' objects}
>     19999    0.007    0.000    0.007    0.000 {method 'strip' of 'str'
> objects}
>         1    0.000    0.000    0.000    0.000 {method 'xreadlines' of
> 'file' objects}
>         1    0.000    0.000    0.000    0.000 {open}
>
>
> The method takes > 10 seconds, the function call 0.041 seconds!
>
> Yes, I know that I wrote the underlying code rather
> inefficiently, and I can streamline it with a single
> file.read()  call instead if an xreadlines() + strip loop.
> Still, the differences in performance are rather staggering!
> Any comments?

It is likely the repeated attribute lookup, self.s, that's
slowing it down in comparison to the non-method version.

Try the following simple optimization, using a local variable
instead of an attribute to build up the result.

# The method in a class
class bar:
    def readgenome(self, filehandle):
        s = ''
        for line in filehandle.xreadlines():
            if '>' in line:
                continue
            s += line.strip()
        self.s = s

To further speed things up, think about using the str.join idiom
instead of str.+=, and using a generator expression instead of an
explicit loop.

# The method in a class
class bar:
    def readgenome(self, filehandle):
        self.s = ''.join(line.strip() for line in filehandle)

-- 
Neil Cerutti