coverage.py: "Statement coverage is the weakest measure of code coverage"

Sun Oct 28 20:38:01 EDT 2007

On Oct 28, 4:56 pm, Ben Finney <b... at benfinney.id.au> wrote:
> Howdy all,
>
> Ned Batchelder has been maintaining the nice simple tool 'coverage.py'
> <URL:http://nedbatchelder.com/code/modules/coverage.html> for
> measuring unit test coverage.
>
> On the same site, Ned includes documentation
> <URL:http://nedbatchelder.com/code/modules/rees-coverage.html> by the
> previous author, Gareth Rees, who says in the "Limitations" section:
>
>     Statement coverage is the weakest measure of code coverage. It
>     can't tell you when an if statement is missing an else clause
>     ("branch coverage"); when a condition is only tested in one
>     direction ("condition coverage"); when a loop is always taken and
>     never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17]
>     <URL:http://www.kaner.com/pnsqc.html> for a summary of test
>     coverage measures.
>
> So, measuring "coverage of executed statements" reports complete
> coverage incorrectly for an inline branch like 'foo if bar else baz',
> or a 'while' statement, or a 'lambda' statement. The coverage is
> reported complete if these statements are executed at all, but no
> check is done for the 'else' clause, or the "no iterations" case, or
> the actual code inside the lambda expression.
>
> What approach could we take to improve 'coverage.py' such that it
> *can* instrument and report on all branches within the written code
> module, including those hidden inside multi-part statements?
>
> --
>  \            "Technology is neither good nor bad; nor is it neutral." |
>   `\                       -Melvin Kranzberg's First Law of Technology |
> _o__)                                                                  |
> Ben Finney

Well, having used it for Python FIT, I've looked at some if its
deficiencies. Not enough to do anything about it (although I did
submit a patch to a different coverage tool), but enough to come to a
few conclusions.

There are two primary limitations: first, it runs off of the debug or
trace hooks in the Python kernel, and second, it's got lots of little
problems due to inconsistencies in the way the compiler tools generate
parse trees.

It's not like there are a huge number of ways to do coverage. At the
low end you just count the number of times you hit a specific point,
and then analyze that.

At the high end, you write a trace to disk, and analyze that.

Likewise, on the low end you take advantage of existing hooks, like
Python's debug and trace hooks, on the high end you instrument the
program yourself, either by rewriting it to put trace or count
statements everywhere, or by modifying the bytecode to do the same
thing.

If I was going to do it, I'd start by recognizing that Python doesn't
have hooks where I need them, and it doesn't have a byte code
dedicated to a debugging hook (I think). In other words, the current
coverage.py tool is getting the most out of the available hooks: the
ones we really need just aren't there.

I'd probably opt to rewrite the programs (automatically, of course) to
add instrumentation statements. Then I could wallow in data to my
heart's content.

One last little snark: how many of us keep our statement coverage
above 95%? Statement coverage may be the weakest form of coverage, but
it's also the simplest to handle.

John Roth