Unit testing data; What to test

Sun Feb 18 03:16:32 EST 2001

"John J. Lee" <phrxy at csv.warwick.ac.uk> wrote in message
news:Pine.SOL.4.30.0102160309090.7729-100000 at mimosa.csv.warwick.ac.uk...
    [snip]
> The first question is straightforward: do people have a standard, simple
> way of handling data for tests, or do you just tend to stick most of it in
> the test classes?  KISS I suppose.  But if the software is going to change
> a lot, isn't it a good idea to separate the tests from their input and
> expected result data?  Of course with big data -- I have some mathematical
> functions that need to be checked for example -- you're obviously not
> going to dump it directly the test code: I'm wondering more about data of
> the order of tens of objects (objects in the Python sense).

If you keep the (stimuli, expected_responses) sets separate from the
module being tested, you gain much the same benefits (and pay much
the same costs) as by keeping documentation separate from code for
other kinds of docs (think of tests as a form of executable docs...!).

Keeping some tests/docs together with the code simplifies distribution
issues: anybody who has your code also has this minimal set of tests
(and other docs).  However, in some cases, the total amount of tests
and other docs can easily swap the amount of code -- and in this case
keeping them together can easily be thought of as "too costly". The
code can become practically unreadable if there are an order of magnitude
more lines of docs (including tests) than lines of actual source in a .py.

My favourite approach (and I do wish I was more systematical in
practising what I preach:-) is to try and strike a balance: keep with
the code (docstrings, comments, special-purpose test functions) a
reasonable minimal amount of tests (& other docs) -- indicatively,
roughly the same order of magnitude as the source code itself;
but _also_ have a separate set of docs (& tests) for more extensive
needs.  The ideal line of separation would be, is this stuff going to
be needed only by users who are also going to read or change the
sources, or is it going to be more generally useful.  Docs & tests that
only work at the *interface* of the module, without concern for its
*internals*, may allow many users to treat the module as a "black
box", only reading/running/enriching the separate docs-and-tests.

> In fact, do unit tests often end up having to be rewritten as code is
> refactored?  Presumably yes, given the (low) level that unit tests are
> written at.

This is another good guideline: docs and tests best kept with the
source are those who'll likely need changing anyway when the code
is refactored.  The unit of reuse is the unit of release: when you
typically have internals changes that leave alone the interface of
the module, then that interface might usefully be documented and
tested "outside" the source code file -- you can imagine releasing
enhanced sources with identical "externals" docs/tests, or richer
tests/docs that require no re-release of the code itself.

> The second (third) one is vague: What / How should one test?  Discuss.
>
> I realise these questions are not strictly python-specific, but (let's

So is much of what gets discussed here, and, as a born rambler, I
love the ethos of this group, which welcomes such 'somewhat OT'
discussions!-)

> see, how do I justify this) it seems most of what is out there on the web
> & USENET is either inappropriately formal, large-scale and
> business-orientated for me (comp.software.testing and its FAQ, for
> example), or merely describes a testing framework.  A few scattered XP
> articles are the only exceptions I've found.

If XP fits your needs, you could definitely do worse than adopt it
wholesale!  Yes, much of what gets discussed about software
development deals with large-scale SW (in testing and elsewhere) --
that's because problems grow non-linearly with SW scale... when
you release an application that's about 1,000 SLOC, it does not
really matter much if your process and approach are jumbled; at
10,000 SLOC, it's a problem; at 100,000, an utter nightmare, so
you HAVE to be precise in defining your process then (or, call it
100, 1000, 10,000 FP -- but it's really about SLOC more than it
is about FP, which is why higher-level languages help so much).

Differently from what XP specifies, I think tests should be in two
categories (and, similarly, so should deliverables be, rather than
the just-one-deliverable which so characterizes XP -- that is a
tad too X for my conservative self, even though I buy into 70+%
of XP's tenets!).  Which, again, leads us to the internal/external
tests-and-docs split.  External tests and docs (possibly, in large
scale devt, on several scales: module aka unit, subsystem, whole
system) deal with externals/interfaces (not just GUI's &c -- I'm
talking about, e.g, module interfaces to other software; _of course_
'engine' and 'presentation' SHOULD almost invariably be separate
components, but that's another plane of 'split').  Internal tests
and docs deal with internals -- the kind of thing that needs to be
tweaked at each refactoring.

That's not the same dividing plane as the classic unit vs system
test distinction -- it's slanted differently, and perks up again at
each granularity level in a large-enough project (minimal granule
being the module -- perhaps, at Python level, package -- not the
single function or class, since functions and classes inside one
module _are_ typically coupled too strongly to test/release/doc
independently... there are exceptions, modules that are not very
cohesive but rather collections of somewhat unrelated stuff for
basically 'packaging' purposes, but they should be the exception
rather than the rule).

> I'm sure there must be good and bad ways to test -- for example, I read
> somewhere (can't find it now) that you should aim to end up so that each
> bug generates, on (mode) average, one test failure, or at least a small
> number.  The justification for this was that lots of tests failing as a
> result of a single bug are difficult to deal with.  It seems to me that
> this goal is a) impossible to achieve and b) pointless, since if multiple
> test failures really are due to a single bug, they will all go away when
> you fix it, just as compile-time errors often do.  No?

True.  However, if tests are designed in terms of a _sequence_, it IS
often possible to arrange for the most fundamental tests to be run
*first*, ensuring minimal workability of some lower-level, call it
'infrastructural', kind of objects, so that dependent (higher level)
parts can be tested _assuming_ the lower-level stuff works.  This
is more of a consideration for 'internals' kinds of tests, IMHO.

Alex