How to test?

Mon Apr 27 02:21:39 EDT 2020

>> If I have understood correctly, the objective is to check a dir-tree
>> to ensure that specific directory/file-permissions are in-effect/have
>> not been changed. The specifications come from a .JSON file and may
>> be over-ridden by command-line arguments. Correct?
> 
> Yes.

>>>>>>> How to test this in the best way?
>>
>> The best way to prepare for unit-testing is to have 'units' of code
>> (apologies!).
>>
>> An effective guide is to break the code into functions and methods,
>> so that each performs exactly one piece/unit of work - and only the
>> one. A better guide is that if you cannot name the procedure using
>> one description and want to add an "and" an "or" or some other
>> conjunction, perhaps there should be more than one procedure!
>>
>> For example:
>>
>>   >    for cat in cats:
>>   >          ...
>>   >          for d in scantree(cat.dir):
>>   >              # if `keep_fs` was specified then we must
>>   >              # make sure the file is on the same device
>>   >              if cat.keep_fs and devid != get_devid(d.path):
>>   >                  continue
>>   >
>>   >              cat.check(d)
>>
> 
> Above is part of the main() function. But I could make some part of
> the main() function into its own function. Then the call to check a
> directory could be a function argument. For testing I could reuse that
> function by providing a different function for the check (now it is
> cat.checkid(d) )

There is no $charge for the number of functions used. Don't hesitate!

Whilst I haven't heard the phrase used recently, we used to talk about 
systems- and program-design being a process of "step-wise 
decomposition". It was likened to peeling layers off an onion, or 
Russian nesting dolls - as one takes 'off' the outer layer, there is 
more inside.

In programming then, the main() calls a bunch of other functions. Each 
of those, calls its own set of sub-functions, and so-on. At first, we 
are talking 'higher levels' of control - and a good choice of function 
names summarises what's going-on/provides the 'plan of attack' and thus 
good "documentation"!

As control passes to the 'lower level' of functions, the level of detail 
increases. You might think of it like a microscope - the more powerful 
the magnification, the more detail is seen; but the field of view has 
narrowed/become more focussed.

The idea of having one function perform one (and only one) function is 
known as "encapsulation". Another term which was a new 'buzz-word', 
decades ago, is "modular programming" - instead of having one long 
program, splitting it up into functional components.

Yet another traditional phrase describing computer programs is: "input, 
process, output". This can also be applied, at the successive levels of 
detail, to functions - they 'take in' parameter-values, 'process' them 
in some way, and then return, or 'output' the result.

In a simplistic world, one function's output would become input to the 
next function. However, back in the real-world, it is more likely that 
several 'outputs' from earlier functions are used as 'input' by some 
successive procedure. Which is why we are paid 'the big bucks' (maybe).

Earlier functions should only influence or affect the behavior of later 
functions by passing data (output from one, becoming input to another). 
The complex ways in which functions can 'feed' or 'affect' each other is 
called "cohesion". If each function is dependent only upon its input 
data, this is held as an ideal. If a function is not independent, that 
is likely to be and/or become a source of friction and bugs. Again, in 
an ideal world...

There's a lot of follow-through, if you'd care to do some reading using 
such key-words...

>> If this were a function, how would you name it? First of all we are
>> processing every category, then we are scanning a dir-tree, and
>> finally we are doing something to each directory found.
>>
>> If we split these into separate routines, eg (sub-setting the above):
>>
>>   >          for d in scantree(cat.dir):
>>                  do_something_with_directory( d )
>>
>> and you devise a (much) more meaningful name than mine, it will
>> actually help readers (others - but also you!) to understand the
>> steps within the logic.
>>
>> Now if we have a function which checks a single fileNM for
>> 'whatever', the code will start something like:
>>
>> 	def stat_file( fileNM ):
>> 		'''Gather stat information for nominated file.'''
>> 		etc
>>
>> So, we can now write a test-function because we don't need any
>> "categories", we don't need all the dir-tree, and we don't need to
>> have performed a scan - all we need is a (valid and previously
>> inspected) file-path. Thus (using Pytest):
>>
>> 	def test_stat_file( ... ):
>> 		'''Test file stat.'''
>> 		assert stat_file( "...file-path" ) == ...its
>> known-stat
>>
>> 	def test_stat_dir( ... ):
>> 		'''Test file stat.'''
>> 		assert stat_file( "...dir-path" ) == ...its known-stat
>>
>> There is no need to test for a non-existent file, if you are the only
>> user!
>>
>> In case you hadn't thought about it, make the test path, part of your
>> test directory - not part of 'the real world'!
...

> The comments/advices I got were pretty helpful.  I already have
> started to make improvements in my code to be able to test things
> better.
> 
> For example I have changed the check() method in the Policy class to
> get called like this:
> 
>        def check(self, fpath, user, group, mode):
> 
> This means I can test the checks and its result without requiring any
> external test data.
> 
> Thanks a lot to all for your help.

Given your replies, 'now' might be a good time to take a look at Pytest, 
and see how you could use it to help build better code - by building 
tested units/functions which are assembled into ever-larger tested-units...
(there is a range of choice/other testing aids if Pytest doesn't take 
your fancy)
-- 
Regards =dn