Exhaustive Unit Testing

Sat Nov 29 07:06:00 EST 2008

On Sat, 29 Nov 2008 11:36:56 +0100, Roel Schroeven wrote:

> Fuzzyman schreef:
>> By the way, to reduce the number of independent code paths you need to
>> test you can use mocking. You only need to test the logic inside the
>> methods you create (testing behaviour), and not every possible
>> combination of paths.
> 
> I don't understand that. This is part of something I've never understood
> about unit testing, and each time I try to apply unit testing I bump up
> against, and don't know how to resolve. I find it also difficult to
> explain exactly what I mean.
> 
> Suppose I need to write method spam() that turns out to be somewhat
> complex, like the class method Emanuele was talking about. When I try to
> write test_spam() before the method, I have no way to know that I'm
> going to need so many code paths, and that I'm going to split the code
> out into a number of other functions spam_ham(), spam_eggs(), etc.
> 
> So whatever happens, I still have to test spam(), however many codepaths
> it contains? Even if it only contains a few lines with fors and ifs and
> calls to the other functions, it still needs to be tested? Or not? 

The first thing to remember is that it is impractical for unit tests to 
be exhaustive. Consider the following trivial function:

def add(a, b):  # a and b ints only
    return a+b+1

Clearly you're not expected to test *every imaginable* path through this 
function (ignoring unit tests for error handling and bad input):

assert add(0, 0) == 1
assert add(1, 0) == 2
assert add(2, 0) == 3
assert add(3, 0) == 4
...
assert add(99736263, 8264891001) = 8364627265
...

Instead, your tests for add() can rely on the + operator being 
sufficiently tested that you can trust it, and so you only need to test 
the logic of your function. To do that, it would be sufficient to test a 
relatively small representative sample of data. One test would probably 
be sufficient:

assert add(1, 3) == 5

That test would detect almost all bugs in the function, although of 
course it won't detect every imaginable bug. A second test will make the 
chances of such false negatives virtually disappear.

Now imagine a more complicated function:

def spam(a, b):
    return spam_eggs(a, b) + spam_ham(a) - 2*spam_tomato(b)

Suppose spam_eggs has four paths that need testing (paths A, B, C, D), 
spam_ham and spam_tomato have two each (E F and G H), and let's assume 
that they are all independent. Then your spam unit tests need to test 
every path:

A E G
A E H
A F G
A F H
B E G
B E H
...
D F H

for a total of 4*2*2=16 paths, in the spam unit tests.

But suppose that we have tested spam_eggs independently. It has four 
paths, so we need four tests to cover them all. Now our spam testing can 
assume that spam_eggs is correct, in the same way that we earlier assumed 
that the plus operator was correct, and reduce the number of tests to a 
small set of representative data.

No matter which path through spam_eggs we take, we can trust the result, 
because we have unit tests that will fail if spam_eggs has a bug. So 
instead of testing every path, I choose a much more limited set:

A E G
A E H
A F G
A F H

I arbitrarily choose path A alone, confident that paths B C and D are 
correct, but of course I could make other choices. There's no need to 
test paths B C and D *within spam's unit tests*, because they are already 
tested elsewhere. To test them again within spam doesn't gain me anything.

Consequently, we reduce our total number of tests from 16 to 8 (four 
tests for spam, four for spam_eggs).

> From
> a number of postings in this thread a get the impression (though that
> might be an incorrect interpretation) that many people are content to
> only test the various helper functions, and not the spam() itself. You
> say you don't have to test every possible combination of paths, but how
> thorough is your test suite if you have untested code paths?

The success of this tactic assumes that you can identify code paths and 
make them independent. If they are dependent, then you can't be sure that 
path E G after A is the same as E G after D.

Real world example: compare driving your car from home to the mall to the 
park, compared to driving from work to the mall to the park. The journey 
from the mall to the park is the same, no matter how you got to the mall. 
If you can drive from home to the mall and then to the park, and you can 
drive from work to the mall, then you can be sure that you can drive from 
work to the mall to the park even though you've never done it before.

But if you can't be sure the paths are independent, then you can't make 
that simplifying assumption, and you do have to test more paths in more 
places.

> A related matter (at least in my mind) is this: after I've written
> test_spam() but before spam() is correctly working, I find out that I
> need to write spam_ham() and spam_eggs(), so I need test_spam_ham() and
> test_spam_eggs(). That means that I can never have a green light while
> coding test_spam_ham() and test_stam_eggs(), since test_spam() will
> fail. That feels wrong. 

I would say that means you're letting your tests get too far ahead of 
your code. In theory, you should never have more than one failing test at 
a time: the last test you just wrote. If you have to refactor code so 
much that a bunch of tests start failing, then you need to take those 
tests out, and re-introduce them one at a time. 

In practice, I can't imagine too many people have the discipline to 
follow that practice precisely. I know I don't :)

-- 
Steven