[Numpy-discussion] profiling line by line
Ondrej Certik
ondrej at certik.cz
Thu Sep 18 06:46:29 EDT 2008
On Thu, Sep 18, 2008 at 1:29 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Sep 17, 2008 at 18:09, Ondrej Certik <ondrej at certik.cz> wrote:
>> On Wed, Sep 17, 2008 at 3:56 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg <arnar.flatberg at gmail.com> wrote:
>>>> That would make me an extremely happy user, I've been looking for this for
>>>> years!
>>>> I can't imagine I'm the only one who profiles some hundred lines of code and
>>>> ends up with 90% of total time in the dot-function
>>>
>>> For the time being, you can grab it here:
>>>
>>> http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/
>>>
>>> It requires Cython and a C compiler to build. I'm still debating
>>> myself about the desired workflow for using it, but for now, it only
>>> profiles functions which you have registered with it. I have made the
>>> profiler work as a decorator to make this easy. E.g.,
>>>
>>> from line_profiler import LineProfiler
>>>
>>> profile = LineProfiler()
>>>
>>> @profile
>>> def my_slow_func():
>>> ...
>>>
>>> profile.dump_stats('my_slow_func.lprof')
>>>
>>> This is kind of inconvenient, so I have a little script that handles
>>> everything except for the @profile itself. It started as a script to
>>> run cProfile nicely, so it actually does that by default.
>>>
>>> I took pystone.py from the Python source and added a couple of
>>> @profile decorators to demonstrate:
>>>
>>>
>>> [line_profiler]$ ./kernprof.py --help
>>> Usage: ./kernprof.py [-s setupfile] [-o output_file_path] scriptfile [arg] ...
>>>
>>> Options:
>>> -h, --help show this help message and exit
>>> -l, --line-by-line Use the line-by-line profiler from the line_profiler
>>> module instead of cProfile. Implies --builtin.
>>> -b, --builtin Put 'profile' in the builtins. Use 'profile.enable()'
>>> and 'profile.disable()' in your code to turn it on and
>>> off, or '@profile' to decorate a single function, or
>>> 'with profile:' to profile a single section of code.
>>> -o OUTFILE, --outfile=OUTFILE
>>> Save stats to <outfile>
>>> -s SETUP, --setup=SETUP
>>> Code to execute before the code to profile
>>>
>>> [line_profiler]$ ./kernprof.py -l pystone.py
>>> Pystone(1.1) time for 50000 passes = 11.34
>>> This machine benchmarks at 4409.17 pystones/second
>>> Wrote profile results to pystone.py.lprof
>>>
>>> [line_profiler]$ ./view_line_prof.py pystone.py.lprof
>>> Timer unit: 1e-06 s
>>>
>>> File: pystone.py
>>> Function: Proc0 at line 79
>>> Total time: 8.46516 s
>> [...]
>>
>> This is what I am getting:
>>
>> $ ./kernprof.py -l pystone.py
>> Wrote profile results to pystone.py.lprof
>> $ ./view_line_prof.py pystone.py.lprof
>> Timer unit: 1e-06 s
>>
>> $
>>
>> So I think you meant:
>>
>> $ ./kernprof.py -l mystone.py
>> 20628
>> Wrote profile results to mystone.py.lprof
>> $ ./view_line_prof.py mystone.py.lprof
>> Timer unit: 1e-06 s
>>
>> File: pystone.py
>> Function: Proc0 at line 79
>> Total time: 13.0803 s
>> [...]
>>
>> Now it works.
>
> No, I meant pystone.py. My script-finding code may have (incorrectly)
> found a different, uninstrumented pystone.py file somewhere else,
> though. Try with "./pystone.py".
>
>> This is an excellent piece of software! Nice job. Just today I needed
>> such a thing!
>>
>>
>> How do you easily install it?
>
> "python setup.py install" should have installed the module. I haven't
> done anything with the scripts, yet.
>
>> I usually do "python setup.py install
>> --home=~/lib" and I have the PYTHONPATH + PATH setup in my .bashrc,
>> but I then need to manually remove the stuff from my ~/lib if I want
>> to uninstall, which sucks. So this time I just did "python setup.py
>> build" and moved the .so file manually to the current dir. But there
>> must be a better way. What is your workflow?
>
> For things I am developing on, I build them in-place, using the
> setuptools "develop" command to add the appropriate path to the
> easy-install.pth file. To remove, I would just edit that file.
>
> For thing I'm not developing on, I usually build and install an egg if
> at all possible.
>
> But then, I'm typically on a single-user box where I'm root, so I
> sometimes do nasty and unsanitary things like chown -R rkern:rkern
> /usr/local/.
>
>> Anyway, so I used it on my code and here is what I got:
>>
>> File: hermes1d/mesh.py
>> Function: integrate_function at line 119
>> Total time: 0.647412 s
>>
>> Line # Hits Time % Time Line Contents
>> =====================================================
>> 119 @profile
>> 120 def integrate_function(self, f):
>> 121 """
>> 122 Integrate the function
>> "f" on the element.
>> 123 """
>> 124 96 1091 0.2 from numpy import array
>> 125 96 461070 71.2 from scipy.integrate
>> import quadrature
>> 126 96 496 0.1 a, b =
>> self.nodes[0].x, self.nodes[1].x
>> 127 96 418 0.1 def func(x):
>> 128 #print x
>> 129 #print
>> array([f.f(y) for y in x])
>> 130 return
>> array([f.f(y) for y in x])
>> 131 96 183903 28.4 val, err =
>> quadrature(func, a, b)
>> 132 #print val, a, b
>> 133 #stop
>> 134 96 434 0.1 return val
>>
>>
>>
>> This is interesting, because 70% of time is spent on "from
>> scipy.integrate import quadrature". So this is not good.
>>
>> So I moved it out and voila! My code is 3.4x faster. I really didn't
>> know that importing (scipy.integrate) is *so* slow.
>
> It should only be slow the first time. If it's not, then there's a
> problem somewhere (and I wouldn't put it past being in the profiler;
> it might screw up something in the import mechanism). Can you do the
> import both outside the function *and* inside to see if that changes
> the result?
Sure. "b.py":
from scipy.integrate import quadrature
---------------
def a():
b = 1
for x in range(1000000):
a()
---------------
"a.py":
---------------
def a():
from scipy.integrate import quadrature
b = 1
for x in range(1000000):
a()
---------------
and:
ondra at pc232:~/repos/hermes1d$ time python a.py
real 0m2.655s
user 0m2.612s
sys 0m0.044s
ondra at pc232:~/repos/hermes1d$ time python b.py
real 0m0.620s
user 0m0.540s
sys 0m0.080s
So I guess that speaks for itself. Don't have time to dig more, but I
think it's this line in scipy/integrate/__init__:
__all__ = filter(lambda s:not s.startswith('_'),dir())
But maybe also something else, I don't know.
Ondrej
Ondrej
More information about the NumPy-Discussion
mailing list