How to test?

Sat Apr 25 23:26:58 EDT 2020

On 25/04/20 7:53 PM, Manfred Lotz wrote:
> On Sat, 25 Apr 2020 18:41:37 +1200
> DL Neil <PythonList at DancesWithMice.info> wrote:
> 
>> On 25/04/20 5:16 PM, Manfred Lotz wrote:
>>> On Fri, 24 Apr 2020 19:12:39 -0300
>>> Cholo Lennon <chololennon at hotmail.com> wrote:
>>>    
>>>> On 24/4/20 15:40, Manfred Lotz wrote:
>>>>> I have a command like application which checks a directory tree
>>>>> for certain things. If there are errors then messages will be
>>>>> written to stdout.

 > What I do here specifically is to check directory trees' file objects
 > for user and group ownerships as well as permissions according to a
 > given policy. There is a general policy and there could be exceptions
 > which specify a specific policy for a certain file.

If I have understood correctly, the objective is to check a dir-tree to 
ensure that specific directory/file-permissions are in-effect/have not 
been changed. The specifications come from a .JSON file and may be 
over-ridden by command-line arguments. Correct?

There must be a whole 'genre' of programs which inspect a directory-tree 
and doing 'something' to the files-contained. I had a few, and needing 
another, decided to write a generic 'scanner' which would then call a 
tailored-function to perform the particular 'something' - come to think 
of it, am not sure if that was ever quite finished. Sigh!

 >>>>> One idea was for the error situations to write messages to files
 >>>>> and then later when running the tests to compare the error
 >>>>> messages output to the previously saved output.
 >>>>>
 >>>>> Is there anything better?

The next specification appears to be that you want a list of files and 
their stats, perhaps reporting only any exceptions which weren't there 
'last time'.

In which case, an exception could be raised, or it might be simpler to 
call a reporting-function when a file should be 'reported'.

I'm still a little confused about the difference between 
printing/logging/reporting some sort of exception, and the need to 
compare with 'history'.

The problem with the "previously saved output" is the process of linking 
'today's data' with that from the last run. I would use a database (but 
then, that's my background/bias) to store each file's stat.

Alternately, if you only need to store exceptions, and that number is 
likely to be small, perhaps output only the exceptions from 'this run' 
to a .JSON/.yaml file - which would become input (and a dict for easy 
look-ups) next time?

 >>>> Maybe I am wrong because I don't understand your scenario: If your
 >>>> application is like a command, it has to return an error code to
 >>>> the system, a distinct number for each error condition. The error
 >>>> code is easier to test than the stdout/stderr.

Either way, you might decrease the amount of data to be stored by 
reducing the file's stat to a code.

>>>>> How to test this in the best way?

The best way to prepare for unit-testing is to have 'units' of code 
(apologies!).

An effective guide is to break the code into functions and methods, so 
that each performs exactly one piece/unit of work - and only the one. A 
better guide is that if you cannot name the procedure using one 
description and want to add an "and" an "or" or some other conjunction, 
perhaps there should be more than one procedure!

For example:

 >    for cat in cats:
 >          ...
 >          for d in scantree(cat.dir):
 >              # if `keep_fs` was specified then we must
 >              # make sure the file is on the same device
 >              if cat.keep_fs and devid != get_devid(d.path):
 >                  continue
 >
 >              cat.check(d)

If this were a function, how would you name it? First of all we are 
processing every category, then we are scanning a dir-tree, and finally 
we are doing something to each directory found.

If we split these into separate routines, eg (sub-setting the above):

 >          for d in scantree(cat.dir):
                do_something_with_directory( d )

and you devise a (much) more meaningful name than mine, it will actually 
help readers (others - but also you!) to understand the steps within the 
logic.

Now if we have a function which checks a single fileNM for 'whatever', 
the code will start something like:

	def stat_file( fileNM ):
		'''Gather stat information for nominated file.'''
		etc

So, we can now write a test-function because we don't need any 
"categories", we don't need all the dir-tree, and we don't need to have 
performed a scan - all we need is a (valid and previously inspected) 
file-path. Thus (using Pytest):

	def test_stat_file( ... ):
		'''Test file stat.'''
		assert stat_file( "...file-path" ) == ...its known-stat

	def test_stat_dir( ... ):
		'''Test file stat.'''
		assert stat_file( "...dir-path" ) == ...its known-stat

There is no need to test for a non-existent file, if you are the only user!

In case you hadn't thought about it, make the test path, part of your 
test directory - not part of 'the real world'!

This is why TDD ("Test-Driven Development) theory says that one should 
devise tests 'first' (and the actual code, 'second'). You must design 
the logic first, and then start bottom-up, by writing tests to 'prove' 
the most simple functionality (and then the code to 'pass' those tests), 
gradually expanding the scope of the tests as single functions are 
combined in calls from larger/wider/controlling functions...

This takes the previous time-honored practice of writing the code first, 
and then testing it (or even, 'throwing it over the wall' to a test 
department), and turns it on its head. (no wall necessary)

I find it a very good way of making sure that when coding a system 
comprising many 'moving parts', when I make "a little change" to some 
component 'over here', the test-regime will quickly reveal any 
unintended consequences (effects, artifacts), etc, 'over there' - even 
though my (very) small brain hadn't realised the impact! ("it's just 
one, little, change!")

> The policy file is a JSON file and could have different categories.
> Each category defines a policy for a certain directory tree. Comand
> line args could overwrite directory names, as well as user and group.
> Or if for example a directory is not specified in the JSON file I am
> required to specify it via command line. Otherwise no check can take
> place.
> default":
>    {
>        "match_perms":  "644",
>        "match_permsd":  "755",
>        "match_permsx":  "755",
>        "owner":  "manfred",
>        "group":  "manfred"
>    }

Won't a directory-path be required, to tie 'policy' to 'tree'?

Is that enough feedback to help you take a few more steps?
-- 
Regards =dn