[Tutor] Unit testing in Python (3.3.0) for beginners

Sun Dec 8 13:42:46 CET 2013

On 12/08/2013 11:22 AM, Rafael Knuth wrote:
> Hey there,
>
> I struggle to understand what unit testing specifically means in
> practice and how to actually write unit tests for my code (my gut is
> telling me that it's a fairly important concept to understand).
> [...]

Hello Rafael,

This post is quite long, and partly personal. As you guess, testing is in fact 
an important topic, and there is much to say about it. As usual in programming, 
the domain comes with its load of ambiguity, contradictions and 
misunderstandings. Question number 0: why testing? My view is that testing is a 
tool that helps one making good software by finding bugs, or trying to. There 
are at least 2 interpretations of this:
1. Help & finding symptoms (signs) of errors, meaning obviously wrong things: I 
will this a control test
2. Help & finding the source of such symtoms, meaning the actual error: I call 
this a diagnosis test.
What a test has _not_ as purpose is proving your software correct; this is not 
possible (except if it is very simple and runs on a closed set of variable input 
data).

"Unit testing" means testing a whole, consistent section of source code: 
typically a function, a complex object, a data structtre, or a(nother) class. A 
test is then one or more test functions that try and find bugs in this section 
of code. In languages like Python with a prominent concept of module, there can 
be a module's whole test function, which usually just calls every test func in 
the module. This may look like this:

def test1 ():
     ...
def test2 ():
     ...
def test3 ():
     ...

def test ():
     test1()
     test2()
     test3()

if __name__ == "__main__":
     test()  # to comment out when running without test
     pass

Now, what does each test func do?  Since you just discovered classes, I will use 
as example a simple Class 'Position' (or part of it):

class Position:
     def __init__ (self, x=0, y=0):
         self.x, self.y = x, y
     def move (self, dt):
         # dt is delta-time (difference of time)

Move changes the position according to elapsed time and a movement scheme, maybe 
as complicated as linear in time (I'm joking); maybe when reaching a border (eg 
screen limit), an object bounces or magically jumps to the other side; or maybe 
position also holds vx & vy speed coordinates (velocity vector), which also change.

To test move, one would provide a series of test data, here a list of dt values. 
Then create a specimen of Position, run move for each dt, and for each case 
_check_ that the resulting position is ok.

     # a series of checks like:
     pos.move(dt)
     ok = (pos.x == correct_x) and (pos.y == correct_y)
     if ok:
         ...
     else:
         ...

You may simplify by providing a Position '==' method (__eq__)

     def __eq__ (self, other)
         return (self.x == other.x) and (self.y == other.y)
         # or just tuple equality:
         return (self.x, self.y) == (other.x, other .y)

Which permits checking more simply:

     # a series of checks like:
     pos.move(dt)
     if  (pos == correct_pos):
         ...
     else:
         ...

Now consider a situation where instead of move we have:

     def movement (dt):
         ...
         return (dx, dy)

'move' was an "action-function" some performs an effect, thus one checks its 
effect. 'movement' is an "evaluation function", a function properly speaking 
that computes some value (here, a pair of values), thus we can directly check 
its output result:

     # a series of checks like:
     if pos.movement(dt) == correct_movement:     # a tuple (dx, dy):
         ...
     else:
         ...

As _simple_ as that. And tests should definitely be simple, stupid. Otherwise we 
would have to test tests (it happened to me ;-). There are several sources of 
error in tests: base objects (here a position), test data (dt values), test 
correct results (new positions), etc... plus test logic if you don't make it 
trivial enough. A wrong test finds "false positives", meaning wrong errors where 
there are none, and "false negatives", meaning good outcomes which actually are 
bad. One may spoil tons of time because of that... (this also happened to me, in 
fact numerous times). Tests must be very simple.

Another question is: what to check? As you probably guess, all kinds of special 
values, border values, exceptional values, are the ones to test in priority. In 
our case, dt=0, dt=1, dt=big, dt=very_very_big. But there is a whole category of 
checks often completely forgotten, namely failure checks: most methods should 
just fail on wrong or dubious input (which I call anomalies). What happens if dt 
is negative? What if it is is an int instead of a float, or conversely (python 
would happily compute if you don't mind)? What if it's not anumber at all? To 
test that, if failure of a given function translates to raising an exception 
(stop program execution with an error message), your check must catch the 
possible exception. (Maybe you don't know that yet.)

And most importantly, what should we actually do on dt=0? Does it make any sense 
for client code (the caller of Position.move) to ask for movement in no time at 
all? My strategy (this is very personal) is to fail on such dubious cases: the 
reason is that such kinds of 0's, as well as most cases of empty lists, sets, 
strings, etc, actually are symptoms of bugs somewhere else in the client source 
code. Very probably, dt should not be 0; it means there is certainly a logical 
error where dt is defined or computed. That the method or function fails (throws 
an exception) on such anomalies helps the programmer know about probable bugs. 
(Anyway, if it's not a bug, why call move for dt=0)?

Now, what do we do for each check when we know its outcome (good, bad)? This is 
where the two categories of tests (1. & 2. above), maintenance & diagnosis, 
enter the stage.

1. Say you just modified the code, either of Position, or of some related piece 
of your logic (something Position uses, eg movement schemes). This may be during 
initial development or in a maintenance phase. In this case, you mostly want to 
be notified of check failures. Initially, if there are several ones, you want to 
know all failures, because they may be related. This tells the expected 
behaviour of checks, once they they know the outcome: if bad, write a succint 
check-failure report (on standard error channel, meaning usually just on the 
terminal), then continue with the test suite.
Whenever one makes such a modification on code that (apparently) was more or 
less correct, one should run all control tests of all possibly related parts of 
the code base, or even impossibly related ones, because there are often hidden 
relations (possibly showing design errors).

2. Now, say you know or suspect a bug in your code; or maybe you are just not 
sure of what happens, of what your code actually does, how and why. This is a 
situation of diagnosis; tests may here serve to help you understand better, 
possibly find an actual error, the source of misfonctionings you notice or 
suspect. You want to get all possibly relevant information about the execution. 
This time, most commonly, successful checks are good informaton: they show what 
happens in good cases, or rather cases you think are good. They also serve as 
reference for comparison with failing cases, which is invaluable. Since all is 
written down, you can literally see the outcomes, and compare visually.
Tests here are a very useful complement, or an alternativ, to debug prints or 
the usage of a debugger (search these 2 topics if you are not familiar with 
them). Most importantly, they provide information you are familiar with, in a 
familiar format.

This means you should define output formats for all relevant pieces of data. 
Even possibly two kinds, as Python offers: __repr__ may reproduce the notation 
in source code, and is mostly for programmer feedback, in tests and debugging in 
general; __str__ may be better used for user output, or sometimes in trivial 
cases be better suited for programmer feedback as well, as in the case of 
Position. This may give (just an example):

     def __repr__(self):         # "Position(1, 2)"
         ''' programmer feedback expression as "Position(x, y)" '''
         return "Position(%s, %s) % (self.x, self.y)"
     def __str__(self):          # "(x:1 y:2)"
         ''' user output expression as (x:x y:y) '''
         return "(x:%s y:%s) % (self.x self.y)"

Now, you can use such data output to write a nice report format for positive and 
negative test check outcomes, according to your preference, or models you notice 
in other people's code happen to like. They should just show all relevant 
information, and only that, in the clearest possible way. End of the story.
As a side-note, in my view, all of this should be built into every programming 
language, even designed right from the start, at least a simple but good and 
standard version.

Denis