trace.py and coverage.py

Thu Jan 3 12:30:44 EST 2002

At 08:53 -0800 2002-01-03, Zooko wrote:
>The usage interfaces are very different.  Not sure which, if either, 
>is better.

I developed the coverage.py interface so that it will support 
coverage testing in a variety of testing scenarios.

In the coverage.py model you run tests, generate a report, use the 
report to direct your testing activity to fruitful locations in the 
code, generate a new report based on all the testing so far, and so 
on.

When you're testing things by hand, or when you're developing a test 
suite, it's important to be able to accumulate coverage information 
over a series of tests.  It may not be cost-effective to go back and 
run the whole set of tests again with coverage turned on (either 
because the tests are expensive to set up or take too long to 
execute).

Tracing is a different kind of activity from coverage testing and 
probably needs a different user interface.

>coverage.py is much faster.  In my tests, coverage.py takes less 
>than 2 seconds where trace.py takes 30 seconds.

It's obvious where the bottleneck is in a tracing or coverage 
application: it's the function that you pass to sys.settrace.  Here's 
the tracing function from coverage.py:

c = {}
def t(f, x, y):
     c[(f.f_code.co_filename, f.f_lineno)] = 1
     return t

Here's what led me to implement it this way:

    1. If you try to increment a count of the number of times a line 
has been executed then you need to handle the initial case (when 
there's no entry for the line of code) and integer overflow.  It's 
much cheaper to set the hash entry to 1.

    2. I thought at first that because many lines of code get executed 
from each file, it would be better to write

    c[f.f_code.co_filename][f.f_lineno)] = 1

But it turns out that the extra code needed to handle the base cases 
takes more time than the construction of the pair, which happens in 
the Python core.

    3. You can make Python run faster by giving variables shorter 
names!  I presume that this is because variables are looked up by 
name in the environment at run time.  So variables with shorter names 
can be looked up more quickly.  Hence c, f, t, x, y in my code.

>We consider merging the best features of these two tools and 
>replacing the standard distribution's trace.py with the new merged 
>tool.

I think code sharing is appropriate between coverage testing and 
tracing tools (the parsing code in particular).  The tracing function 
itself would need to be different (for reasons of speed as explained 
above).  The user interfaces may need to be different (see my notes 
at the top of this e-mail), so I'm not 100% sure of the merits of a 
tool that tried to do both tasks.

I think the coverage.py licence is flexible enough that you shouldn't 
have any trouble re-using my code.  Let me know if you need my help.