Where to put the error handing test?

Lie Ryan lie.1296 at gmail.com
Tue Nov 24 16:17:09 EST 2009


Peng Yu wrote:
> On Tue, Nov 24, 2009 at 4:58 AM, Dave Angel <davea at ieee.org> wrote:

I'll put an extra emphasis on this:
>> Your question is so open-ended as to be unanswerable.  
========================================================

> I'll still confused by the guideline that an error should be caught as
> early as possible.

but not too early. Errors must be RAISED as early as possible. Errors 
must be CAUGHT as soon as there is enough information to handle the errors.

> Suppose I have the following call chain
> 
> f1() --> f2() --> f3() --> f4()
> 
> The input in f1() might cause an error in f4(). However, this error
> can of cause be caught by f1(), whenever I want to do so. In the worst
> case, I could duplicate the code of f2 and f3, and the test code in f4
> to f1(), to catch the error in f1 rather than f4. But I don't think
> that this is what you mean.

Why would f1() have faulty data? Does it come from external input? Then 
the input must be invalidated at f1(). Then f2(), f3(), and f4() does 
not require any argument checking since it is assumed that f1() calls 
them with good input.

Of course, there is cases where data validation may not be on f1. For 
example, if f1() is a function that receives keyboard input for a 
filename f1 validates whether the filename is valid. Then f2 might open 
the file and validate that the file is of the expected type (if f2 
expect a .csv file, but given an .xls file, f2 would scream out an 
error). f3 received rows/lines of data from f2 and extracts 
fields/columns from the rows; but some of the data might be faulty and 
these must be filtered out before the data is transformed by the f4. f4 
assumes f3 cleans the row and so f4 is does not do error-checking. The 
transformed data then is rewritten back by f3. Now there is an f5 which 
creates new data, the new data needs to be transformed by f4 as well and 
since f5 creates a new data, there is no way for it to create invalid 
data. Now assume f6 merges new data from another csv file; f6's data 
will also be transformed by f4, f6 can use the same validation as in f3 
but why not turn f2,f3 instead to open the other file? Now f7 will merge 
data from multiple f3 streams and transforms by f4. Then comes f8 which 
creates new data from user input, the user-inputted data will need some 
checking; and now we have trouble since the data validation is inside 
f3. But we can easily factor out the validation part into f9 and call f9 
from f3 and f8.

The example case still checks for problem as early as possible. It 
checks for problem when it is possible to determine whether a particular 
condition is problematic. The as early as possible guideline does not 
mean f1, a filename input function, must check whether the csv fields 
contains valid data. But f2, the file opening function, must be given a 
valid filename; f3, the file parser, must be given a valid file object 
with the proper type; f4, the transformer, must be given a valid row to 
transform; etc. f2 should not need to check it is given a valid 
filename, it's f1 job to validate it; f3 should not need to check 
whether the file object is of the proper type; f4 should not need to 
check it is given a valid row tuple; and so on...

> Then the problem is where to put the test code more effectively. I
> would consider 'whether it is obvious to test the condition in the
> give function' as the guideline. However, it might be equal obvious to
> test the same thing two functions, for example, f1 and f4.
> 
> In this case, I thought originally that I should put the test code in
> f1 rather than f4, if f1, f2, f3 and f4 are all the functions that I
> have in the package that I am making. But it is possible that some
> time later I added the function f5(),...,f10() that calls f4(). Since
> f4 doesn't have the test code, f5(),...,f10() should have the same
> test code. This is clearly a redundancy to the code. If I move the
> test code to f4(), there is a redundancy of the code between f1 and
> f4.

I can't think of a good example where such redundancy would happen and 
there is no other, better way to walk around it. Could you provide a 
CONCRETE example? One that might happen, instead one that could 
theoretically happen.



More information about the Python-list mailing list