[Cython] AddTraceback() slows down generators

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Sat Jan 21 22:16:20 CET 2012


On 01/21/2012 07:50 PM, Stefan Behnel wrote:
> Hi,
>
> I did some callgrind profiling on Cython's generators and was surprised to
> find that AddTraceback() represents a serious performance penalty for short
> running generators.
>
> I profiled a compiled Python implementation of itertools.groupby(), which
> yields (key, group) tuples where the group is an iterator again. I ran this
> code in Python for benchmarking:
>
> """
> L = sorted(range(1000)*5)
>
> all(list(g) for k,g in groupby(L))
> """
>
> Groups tend to be rather short in real code, often just one or a couple of
> items, so unpacking the group iterator into a list will usually be a quick
> loop and then the generator raises StopIteration on termination and builds
> a traceback for it. According to callgrind (which, I should note, tends to
> overestimate the amount of time spent in memory allocation), the iteration
> during the group unpacking takes about 30% of the overall runtime of the
> all() loop, and the AddTraceback() call at the end of each group traversal
> takes up to 25% (!) on my side. That means that more than 80% of the group
> unpacking time goes into raising StopIteration from the generators. I
> attached the call graph with the relative timings.

OT: Since you complain that callgrind is inaccurate; are you aware of 
sampling profilers, such as Google perftools? (I don't have experience 
with callgrind myself)

http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

http://pypi.python.org/pypi/yep

Dag

>
> About half of the exception raising time is eaten by PyString_FromFormat()
> that builds the function-name + line-position string (which, I may note, is
> basically a convenience feature). This string is a constant for a
> generator's StopIteration exception, at least for each final return point
> in a generator, but here it is being recreated over and over again, for
> each exception that gets raised.
>
> Even if we keep creating a new frame instance each time (which should be ok
> because CPython has a frame instance cache already and we'd only create one
> during the generator lifetime), the whole code object could actually be
> cached after the first creation, preferably bound to the lifetime of the
> generator creator function/method. Or, more generally, one code object per
> generator termination point, which will be a single point in the majority
> of cases. For the specific code above, that should shave off almost 20% of
> the overall runtime of the all() loop.
>
> I think that's totally worth doing.
>
> Stefan
>
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel



More information about the cython-devel mailing list