How to test?

Sun Apr 26 06:46:01 EDT 2020

On Sun, 26 Apr 2020 15:26:58 +1200
DL Neil <PythonList at DancesWithMice.info> wrote:

> On 25/04/20 7:53 PM, Manfred Lotz wrote:
> > On Sat, 25 Apr 2020 18:41:37 +1200
> > DL Neil <PythonList at DancesWithMice.info> wrote:
> >   
> >> On 25/04/20 5:16 PM, Manfred Lotz wrote:  
> >>> On Fri, 24 Apr 2020 19:12:39 -0300
> >>> Cholo Lennon <chololennon at hotmail.com> wrote:
> >>>      
> >>>> On 24/4/20 15:40, Manfred Lotz wrote:  
> >>>>> I have a command like application which checks a directory tree
> >>>>> for certain things. If there are errors then messages will be
> >>>>> written to stdout.  
> 
>  > What I do here specifically is to check directory trees' file
>  > objects for user and group ownerships as well as permissions
>  > according to a given policy. There is a general policy and there
>  > could be exceptions which specify a specific policy for a certain
>  > file.  
> 
> If I have understood correctly, the objective is to check a dir-tree
> to ensure that specific directory/file-permissions are in-effect/have
> not been changed. The specifications come from a .JSON file and may
> be over-ridden by command-line arguments. Correct?
> 

Yes.

> There must be a whole 'genre' of programs which inspect a
> directory-tree and doing 'something' to the files-contained. I had a
> few, and needing another, decided to write a generic 'scanner' which
> would then call a tailored-function to perform the particular
> 'something' - come to think of it, am not sure if that was ever quite
> finished. Sigh!
> 
> 
>  >>>>> One idea was for the error situations to write messages to
>  >>>>> files and then later when running the tests to compare the
>  >>>>> error messages output to the previously saved output.
>  >>>>>
>  >>>>> Is there anything better?  
> 
> The next specification appears to be that you want a list of files
> and their stats, perhaps reporting only any exceptions which weren't
> there 'last time'.
> 

I just want to report violations against the policy(ies) given in the
JSON file.

> In which case, an exception could be raised, or it might be simpler
> to call a reporting-function when a file should be 'reported'.
> 
> I'm still a little confused about the difference between 
> printing/logging/reporting some sort of exception, and the need to 
> compare with 'history'.
> 

There is no compare with history. I just want to see current violations
(I think I phrased things badly in my previous post so that it looked
like I want to see history.)

> 
> The problem with the "previously saved output" is the process of
> linking 'today's data' with that from the last run. I would use a
> database (but then, that's my background/bias) to store each file's
> stat.
> 

As said above I phrased things badly here in my previous post.

> Alternately, if you only need to store exceptions, and that number is 
> likely to be small, perhaps output only the exceptions from 'this
> run' to a .JSON/.yaml file - which would become input (and a dict for
> easy look-ups) next time?
> 
> 
>  >>>> Maybe I am wrong because I don't understand your scenario: If
>  >>>> your application is like a command, it has to return an error
>  >>>> code to the system, a distinct number for each error condition.
>  >>>> The error code is easier to test than the stdout/stderr.  
> 
> Either way, you might decrease the amount of data to be stored by 
> reducing the file's stat to a code.
> 
> 
> >>>>> How to test this in the best way?  
> 
> The best way to prepare for unit-testing is to have 'units' of code 
> (apologies!).
> 
> An effective guide is to break the code into functions and methods,
> so that each performs exactly one piece/unit of work - and only the
> one. A better guide is that if you cannot name the procedure using
> one description and want to add an "and" an "or" or some other
> conjunction, perhaps there should be more than one procedure!
> 
> For example:
> 
>  >    for cat in cats:
>  >          ...
>  >          for d in scantree(cat.dir):
>  >              # if `keep_fs` was specified then we must
>  >              # make sure the file is on the same device
>  >              if cat.keep_fs and devid != get_devid(d.path):
>  >                  continue
>  >
>  >              cat.check(d)  
> 

Above is part of the main() function. But I could make some part of
the main() function into its own function. Then the call to check a
directory could be a function argument. For testing I could reuse that
function by providing a different function for the check (now it is
cat.checkid(d) )

> If this were a function, how would you name it? First of all we are 
> processing every category, then we are scanning a dir-tree, and
> finally we are doing something to each directory found.
> 
> If we split these into separate routines, eg (sub-setting the above):
> 
>  >          for d in scantree(cat.dir):  
>                 do_something_with_directory( d )
> 
> and you devise a (much) more meaningful name than mine, it will
> actually help readers (others - but also you!) to understand the
> steps within the logic.
> 
> Now if we have a function which checks a single fileNM for
> 'whatever', the code will start something like:
> 
> 	def stat_file( fileNM ):
> 		'''Gather stat information for nominated file.'''
> 		etc
> 
> So, we can now write a test-function because we don't need any 
> "categories", we don't need all the dir-tree, and we don't need to
> have performed a scan - all we need is a (valid and previously
> inspected) file-path. Thus (using Pytest):
> 
> 	def test_stat_file( ... ):
> 		'''Test file stat.'''
> 		assert stat_file( "...file-path" ) == ...its
> known-stat
> 
> 	def test_stat_dir( ... ):
> 		'''Test file stat.'''
> 		assert stat_file( "...dir-path" ) == ...its known-stat
> 
> There is no need to test for a non-existent file, if you are the only
> user!
> 
> In case you hadn't thought about it, make the test path, part of your 
> test directory - not part of 'the real world'!
> 
> 
> This is why TDD ("Test-Driven Development) theory says that one
> should devise tests 'first' (and the actual code, 'second'). You must
> design the logic first, and then start bottom-up, by writing tests to
> 'prove' the most simple functionality (and then the code to 'pass'
> those tests), gradually expanding the scope of the tests as single
> functions are combined in calls from larger/wider/controlling
> functions...
> 
> This takes the previous time-honored practice of writing the code
> first, and then testing it (or even, 'throwing it over the wall' to a
> test department), and turns it on its head. (no wall necessary)
> 
> I find it a very good way of making sure that when coding a system 
> comprising many 'moving parts', when I make "a little change" to some 
> component 'over here', the test-regime will quickly reveal any 
> unintended consequences (effects, artifacts), etc, 'over there' -
> even though my (very) small brain hadn't realised the impact! ("it's
> just one, little, change!")
> 
> 
> > The policy file is a JSON file and could have different categories.
> > Each category defines a policy for a certain directory tree. Comand
> > line args could overwrite directory names, as well as user and
> > group. Or if for example a directory is not specified in the JSON
> > file I am required to specify it via command line. Otherwise no
> > check can take place.
> > default":
> >    {
> >        "match_perms":  "644",
> >        "match_permsd":  "755",
> >        "match_permsx":  "755",
> >        "owner":  "manfred",
> >        "group":  "manfred"
> >    }  
> 
> Won't a directory-path be required, to tie 'policy' to 'tree'?
> 

Yes, a directory path is required. If not present here then it must be
supplied by command line arguments.

> 
> Is that enough feedback to help you take a few more steps?

Yes, I think so. Thanks very much.

The comments/advices I got were pretty helpful.  I already have
started to make improvements in my code to be able to test things
better.

For example I have changed the check() method in the Policy class to
get called like this:

      def check(self, fpath, user, group, mode):

This means I can test the checks and its result without requiring any
external test data.

Thanks a lot to all for your help. 

-- 
Manfred