Iterating over test data in unit tests

Mon Dec 5 20:19:40 EST 2005

Howdy all,

Summary: I'm looking for idioms in unit tests for factoring out
repetitive iteration over test data. I explain my current practice,
and why it's unsatisfactory.

When following test-driven development, writing tests and then coding
to satisfy them, I'll start with some of the simple tests for a class.

    import unittest

    import bowling      # Module to be tested

    class Test_Frame(unittest.TestCase):

        def test_instantiate(self):
            """ Frame instance should be created """
            instance = bowling.Frame()
            self.failUnless(instance)

    class Test_Game(unittest.TestCase):

        def test_instantiate(self):
            """ Game instance should be created """
            instance = bowling.Game()
            self.failUnless(instance)

As I add tests for more interesting functionality, they become more
data dependent.

    class Test_Game(unittest.TestCase):

        # ...

        def test_one_throw(self):
            """ Single throw should result in expected score """
            game = bowling.Game()
            throw = 5
            game.add_throw(throw)
            self.failUnlessEqual(throw, game.get_score())

        def test_three_throws(self):
            """ Three throws should result in expected score """
            game = bowling.Game()
            throws = (5, 7, 4)
            game.add_throw(throws[0])
            game.add_throw(throws[1])
            game.add_throw(throws[2])
            self.failUnlessEqual(sum(throws), game.get_score())

This cries out, of course, for a test fixture to set up instances.

    class Test_Game(unittest.TestCase):

        def setUp(self):
            """ Set up test fixtures """
            self.game = bowling.Game()

        def test_one_throw(self):
            """ Single throw should result in expected score """
            throw = 5
            score = 5
            self.game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

        def test_three_throws(self):
            """ Three throws should result in expected score """
            throws = [5, 7, 4]
            score = sum(throws)
            for throw in throws:
                game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

        def test_strike(self):
            """ Strike should add the following two throws """
            throws = [10, 7, 4, 7]
            score = 39
            for throw in throws:
                game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

So far, this is just following what I see to be common practice for
setting up *instances* to test.

But the repetition of the test *inputs* also cries out to me to be
refactored. I see less commonality in doing this.

My initial instinct is just to put it in the fixtures.

    class Test_Game(unittest.TestCase):

        def setUp(self):
            """ Set up test fixtures """
            self.game = bowling.Game()

            self.game_data = {
                'one': dict(score=5, throws=[5]),
                'three': dict(score=17, throws=[5, 7, 5]),
                'strike': dict(score=39, throws=[10, 7, 5, 7]),
            }

        def test_one_throw(self):
            """ Single throw should result in expected score """
            throws = self.game_data['one']['throws']
            score = self.game_data['one']['score']
            for throw in throws:
                self.game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

        def test_three_throws(self):
            """ Three throws should result in expected score """
            throws = self.game_data['three']['throws']
            score = self.game_data['three']['score']
            for throw in throws:
                game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

        def test_strike(self):
            """ Strike should add the following two throws """
            throws = self.game_data['strike']['throws']
            score = self.game_data['strike']['score']
            for throw in throws:
                game.add_throw(throw)
            self.failUnlessEqual(score, game.get_score())

But this now means that the test functions are almost identical,
except for choosing one data set or another. Maybe that means I need
to have a single test:

        def test_score_throws(self):
            """ Game score should be calculated from throws """
            for dataset in self.game_data:
                score = dataset['score']
                for throw in dataset['throws']:
                    self.game.add_throw(throw)
                self.failUnlessEqual(score, self.game.get_score())

Whoops, now I'm re-using a fixture instance. Maybe I need an instance
of the class for each test case.

        def setUp(self):
            """ Set up test fixtures """
            self.game_data = {
                'one': dict(score=5, throws=[5]),
                'three': dict(score=17, throws=[5, 7, 5]),
                'strike': dict(score=39, throws=[10, 7, 5, 7]),
            }

            self.game_params = {}
            for key, dataset in self.game_data.items():
                params = {}
                instance = bowling.Game()
                params['instance'] = instance
                params['dataset'] = dataset
                self.game_params[key] = params

        def test_score_throws(self):
            """ Game score should be calculated from throws """
            for params in self.game_params.values():
                score = params['dataset']['score']
                instance = params['instance']
                for throw in params['dataset']['throws']:
                    instance.add_throw(throw)
                self.failUnlessEqual(score, instance.get_score())

Good, now the tests for different sets of throws are in a dictionary
that's easy to add to. Of course, now I need to actually know which
one is failing.

        def test_score_throws(self):
            """ Game score should be calculated from throws """
            for key, params in self.game_params.items():
                score = params['dataset']['score']
                instance = params['instance']
                for throw in params['dataset']['throws']:
                    instance.add_throw(throw)
                self.failUnlessEqual(score, instance.get_score(),
                    msg="Score mismatch for set '%s'" % key
                )

It works. It's rather confusing though, since the actual test --
iterate over the throws and check the score -- is in the midst of the
iteration over data sets.

Also, that's just *one* type of test I might need to do. Must I then
repeat all that iteration code for other tests I want to do on the
same data?

Maybe I need to factor out the iteration into a generic iteration
function, taking the actual test as a function object. That way, the
dataset iterator doesn't need to know about the test function, and
vice versa.

        def iterate_test(self, test_func, test_params=None):
            """ Iterate a test function for all the sets """
            if not test_params:
                test_params = self.game_params
            for key, params in test_params.items():
                dataset = params['dataset']
                instance = params['instance']
                test_func(key, dataset, instance)

        def test_score_throws(self):
            """ Game score should be calculated from throws """
            def test_func(key, dataset, instance):
                score = dataset['score']
                for throw in dataset['throws']:
                    instance.add_throw(throw)
                self.failUnlessEqual(score, instance.get_score())

            self.iterate_test(test_func)

That's somewhat clearer; the test function actually focuses on what
it's testing. Those layers of indirection are annoying, but they allow
the data sets to grow without writing more code to handle them.

Testing a rules-based system involves lots of data sets, and each data
set represents a separate test case; but the code for each of those
test cases is mindlessly repetitive. Factoring them out seems like it
needs a lot of indirection, and seems to make each test harder to
read. Different *types* of tests would need multiple iterators, more
complex test parameter dicts, or some more indirection. Those all
sound ugly, but so does repetitively coding every test function
whenever some new data needs to be tested.

How should this be resolved?

-- 
 \       "I never forget a face, but in your case I'll be glad to make |
  `\                                   an exception."  -- Groucho Marx |
_o__)                                                                  |
Ben Finney