Exhaustive Unit Testing

Sat Nov 29 22:42:50 EST 2008

On Sat, 29 Nov 2008 17:13:00 +0100, Roel Schroeven wrote:

> Except that I'm always told that the goal of unit tests, at least
> partly, is to protect us agains mistakes when we make changes to the
> tested functions. They should tell me wether I can still trust spam()
> after refactoring it. Doesn't that mean that the unit test should see
> spam() as a black box, providing a certain (but probably not 100%)
> guarantee that the unit test is still a good test even if I change the
> implementation of spam()?

Yes, but you get to choose how strong that guarantee is. If you want to 
test the same thing in multiple places in your code, you're free to do 
so. Refactoring merely reduces the minimum number of tests you need for 
complete code coverage, not the maximum.

The aim here isn't to cut the number of unit tests down to the absolute 
minimum number required to cover all paths through your code, but to 
reduce that minimum number to something tractable: O(N) or O(N**2) 
instead of O(2**N), where N = some appropriate measure of code complexity.

It is desirable to have some redundant tests, because they reduce the 
chances of a freakish bug just happening to give the correct result for 
the test but wrong results for everything else. (Assuming of course that 
the redundant tests aren't identical -- you gain nothing by running the 
exact same test twice.) They also give you extra confidence that you can 
refactor the code without introducing such freakish bugs. But if you find 
yourself making such sweeping changes to your code base that you no 
longer have such confidence, then by all means add more tests!

> And I don't understand how that works in test-driven development; I
> can't possibly adapt the tests to the code paths in my code, because the
> code doesn't exist yet when I write the test.

That's where you should be using mocks and stubs to ease the pain.

http://en.wikipedia.org/wiki/Mock_object
http://en.wikipedia.org/wiki/Method_stub

>  > To test them again within spam doesn't gain me anything.
> 
> I would think it gains you the freedom of changing spam's implementation
> while still being able to rely on the unit tests. Or maybe I'm thinking
> too far?

No, you are right, and I over-stated the case.

[snip]
>> I would say that means you're letting your tests get too far ahead of
>> your code. In theory, you should never have more than one failing test
>> at a time: the last test you just wrote. If you have to refactor code
>> so much that a bunch of tests start failing, then you need to take
>> those tests out, and re-introduce them one at a time.
> 
> I still fail to see how that works. I know I must be wrong since so many
> people successfully apply TDD, but I don't see what I'm missing.
> 
> Let's take a more-or-less realistic example: I want/need a function to
> calculate the least common multiple of two numbers. First I write some
> tests:
> 
> assert(lcm(1, 1) == 1)
> assert(lcm(2, 5) == 10)
> assert(lcm(2, 4) == 4)

(Aside: assert is not a function, you don't need the parentheses.)

Arguably, that's too many tests. Start with one.

assert lcm(1, 1) == 1

And now write lcm:

def lcm(a, b):
    return 1

That's a stub, and our test passes. So add another test:

assert lcm(2, 5) == 10

and the test fails. So let's fix the function by using gcd.

def lcm(a, b):
    return a/gcd(a, b)*b

(By the way: there's a subtle bug in lcm() that will hit you in Python 3. 
Can you spot it? Here's a hint: your unit tests should also assert that 
the result of lcm is always an int.)

Now that we've introduced a new function, we need a stub and a test for 
it:

def gcd(a, b):
    return 1

Why does the stub return 1? So it will make the lcm test pass. If we had 
more lcm tests, it would be harder to write a gcd stub, hence the 
insistence of only adding a single test at a time.

assert gcd(1, 1) == 1

Now that all the tests work and we get a nice green light. Let's add 
another test. We need to add it to the gcd test suite, because it's the 
latest, least working function. If you add a test to the lcm test suite, 
and it fails, you don't know if it failed because of an error in lcm() or 
because of an error in gcd(). So leave lcm alone until gcm is working:

assert gcd(2, 5) == 2

Now go and fix gcd. At some time you have to decide to stop using a stub 
for gcd, and write the function properly. For a function that simple, 
"now" is that time, but just for the exercise let me write a slightly 
more complicated stub. This is (probably) the next simplest stub which 
allows all the tests to pass while still being "wrong":

def gcd(a, b):
    if a == b:
        return 1
    else:
        return 2

When you're convinced gcd() is working, you can go back and add 
additional tests to lcm.

In practice, of course, you can skip a few steps. It's hard to be 
disciplined enough to program in such tiny little steps. But the cost of 
being less disciplined is that it takes longer to have all the tests pass.

-- 
Steven