[Tutor] OT: "Your tests are only as good as your mocks." Comments?

Sun Jul 25 22:55:41 EDT 2021

On 26/07/2021 05.14, boB Stepp wrote:
> From
> https://swizec.com/blog/what-i-learned-from-software-engineering-at-google/#stubs-and-mocks-make-bad-tests
> 
> 
> The author of this article notes an example from his practice where his
> mock
> database that he used in his tests passed his tests when the actual code in
> production no longer had a database column that was in his mock
> database.  As
> I have begun to play around with databases recently and how to test code
> relying on them, this really caught my attention.
> 
> The overall article itself is a recap of what he read in a book about how
> Google does things ("Software Engineering at Google").  In this situation
> Google advocates for using "fakes" in place of mocks, where these fakes are
> simplified implementations of the real thing maintained by the same team to
> ensure API parity.  How would the development and maintaining of these
> fakes
> be done so that the fakes don't drift from coding reality like the mocks
> might?  It is not clear to me exactly what is going on here.  And a more
> Python-specific question:  Does the Python ecosystem provide tools for
> creating and managing fakes?

It's an amusing story, and one which the author identified as
particularly relevant in larger organisations - but (almost) irrelevant
in a one-man band.

Thus, there seem to be two components to the question(s) and the
thinking behind them. Firstly, the way teams and corporations operate,
and secondly Python tools which support something recommended by an
organisation (which may/not be used by their Python teams). Faking is
newer than the more established techniques of stubs and mocks.
Accordingly, it is generating a lot of light, but we have yet to see if
there will be much heat! We'll get to that, but start with your interest
in moving beyond the sole-coder into a professional dev.team environment:-

Issue 1: Pride (as in, "...goeth before...")
There is a mystique to working for a "FAANG company" (per article), that
somehow translates into a Queen sound-track ("We are the champions"). In
the ?good, old, days we referred to "Blue Chip companies" and thought
them good, eye-catching content for one's resume (been there, done that,
t-shirt too ragged to wear). However, the reality is, they (and their
work-methods) are indeed unlike most others'. Whether they are better,
or not, is up for debate... (further assumption: that all components of
'the organisation' are equal - and magnificent. The reality is that
departments/projects differ widely from each-other - ranging from those
which do shine-brightly, to those which challenge the proverbial pig-sty
for churned-up mud and olfactory discomfort) Just because their
employees think they're 'great' doesn't mean that their approach will
suit any/all of the rest of us.

Issue 2: Arrogance
Within an organisation certain team-leaders attempt to build 'unity'
through a them-and-us strategy. Which like the above, tends to engender
a 'we are better than them' attitude. This in-turn amplifies any point
of difference, often to the point of interfering with or preventing
inter-communication. These days I'd probably be pilloried (are HR
allowed to use 'cruel and unusual punishment'?) for it, but (with my
Project Rescue PM-hat on (Project Manager)) have walked into situations
like this; and all other efforts failing, mandated that teams to get
themselves into the same room and 'hash things out' (professionally), or
... I would start "banging heads together" (unprofessionally) - or worse...
- and yes, I've suffered through scenarios involving the DB-team not
speaking with the dev.teams attempting to 'read' or 'write'. Sigh! (in
fact thinking of the wasted time/money: BIG SIGH!) See also the author's
comment about "Hyrum's Law" - which can only be said to be magnified
when team inter-communication dwindles.

Issue 3: Metrics
There is a rule of human nature, that if some measurement is being used,
work-practice will adapt to maximise on that point. Some of us will
remember the idea that 'good programmers' wrote more LoC ("Lines of
Code") per day, than others. If you were being measured on that, would
it be better to write a three~five-line for-loop block or a single-line
list-comprehension? How many of us really think that shipping our
working "minutely" (per the article) is even remotely a good-idea?
Perhaps we value our reputations? Tell me again: who claims to "do no
harm"? Is there an emphasis on ensuring and assuring tests (at all
levels) if there is a rush to 'production'? (this attitude to testing
has been a problem, in many and varied forms, for as long as there has
been programming)

Issue 4: the Future is a rush!
Whilst it is undeniably exciting, the problem with racing-forwards is
that it will be difficult to anticipate where (future) problems lie. It
is fair to say: no-one can predict the future (least-wise not with
'20/20 vision'). However, Santayana's aphorism also applies: "Those who
cannot remember the past are condemned to repeat it". In this case, "The
Mythical Man-Month" (Brooks). That lesson recounts how adding more
personnel to a 'late' project actually had the opposite effect to that
intended. Which applies to the author's descriptions of adding too many
new staff (to anything) in an uncontrolled, indeed uncontrollable,
fashion. People don't know each other, responsibilities keep shifting,
communication fractures, and becoming difficult/impossible, dries-up! Is
this a technical problem or a management failing?

Issue 5: Solving social problems with 'technical solutions'
Which neatly ties-together much of the above: yes, we've probably all
experienced the 'please upgrade' issue, and the laggards' 'can I upgrade
from so-many-versions-ago to the-new-version all at-once?' plea.
However, just as the author comments (earlier in the article) about
'engineers losing control', so too will users! People's feelings
represents an huge proportion of their willingness/decision to install,
use, and continue to use, an application. There are ways to talk to
people - nay, to make the point: "ways to talk WITH people"!
Accordingly, 'here' at Python, we have a current version of 3.9... yet
cheerfully engage with folk using earlier releases - even (albeit with
some alarm) those who are somehow compelled to stay with Python 2!

With such critique in-mind, let's look at practicalities:-

You (and the author) are quite right, such faults will not be discovered
in what is often 'the normal course' of a team's/an individual's
work-flow! However, remember that one should not (unit) test to
stubs/mocks/interfaces; but test to values! If your code is to divide
two numbers, it had better test for 'that zero problem'; but who-cares
from where the data has been drawn?

If I happen to be doing the DB work, using 'my' (project's) Git repo;
and you are writing app.code within 'your' Git repo, we can both unit
test "until the cows come home" - and never, ever find such an 'error'
as the author described. Unit tests, must by-definition come up short.
Such investigation is (a part of) the province of "Integration Testing".
Who manages that part of the author's CI/CD process? Answer: not the
DB-team, not the app.devs, ... Who then? Oops!

FYI At one time I considered a technical answer to this issue and
thought that MySQL's in-memory DB-engine might serve, ie by putting
tables/table-stubs into memory, by which the speed-increase might
counter the disadvantage of using a 'real' DB. It wasn't suitable,
largely because of the limitations on which data-types which can be
handled - and thus it failed to enable a realistic replacement of the
'real data'.
(https://dev.mysql.com/doc/refman/8.0/en/memory-storage-engine.html)

There is always going to be a problem with modelling - you'd think that
as we do this all-day, every-day, we-computer-people would consider
this. Do we? Adequately?

A mock is a mimic - similar but not the same, and quite possibly even
over-emphasising certain aspects (even at the risk of minimising others).

A stub is an abbreviated form - 'stuff' has, by definition, been left-out.

These are 'judgement calls'. Do we sometimes get these 'wrong'?

Remember that should you stub your toe there may be others prepared to
mock your clumsiness. (Yuk!)

Any tool can be assumed to be offering us 'more' than it really is - it
is easy to assume that we have 'everything covered' when we don't -
after all, isn't that our fate: that no matter how much testing we
perform, there will always be one user who can find some combination of
circumstances we did not foresee...

Finally, in this cynical observation of 'real life', there was talk of
"fakes are simplified implementations of the real thing maintained by
the same team to ensure API parity". Which is "the same team"? The
DB-guys who are only interested in their work, and who work to no metric
which involves helping 'you'? Your team - the ones who have no idea that
the DB-team have 'shifted the goal-posts'? Whither "API parity"? It may
be that instead of discovering that the information used to build the
mock is now out-of-date, all that happens is that you (belatedly)
discover that the "fake" is too-fake...  The deck-chairs have been
rearranged and given new labels ("mock", "fake"), but they are still on
SS Titanic!

So, then, what is the answer? (I'm not sure about the "the"!)

Once again, back in the ?good, old, days (oh, no, here he goes again...)
- which included the "waterfall approach" to systems development, we
called it "Change Control". No-one was allowed to make 'changes' to a
system without appropriate documentation to provide (all-concerned) notice!

Today, imagine a stand-up SCRUM/meeting, and I (ever so grandly)
announce that today I shall be coding a change to the database, adding a
new field, and it will all be so-wonderful. You prick-up your ears and
ask for more info - will it affect your application-code? We agree to
'meet afterwards', and the change is reviewed, and either reversed or
accommodated.

No nasty surprise AFTER we both thought 'job done'! How well we work
together!

Some people think of (unit- and integration-) testing as 'extra work'.
After all, it is the application-code which 'gets the job done'! One of
the contributions of TDD is that testing is integral to development, to
proving, and to maintenance/refactoring. Accordingly, as much care
should be invested in the testing routines as is into the application's!

Whether we use mocks, stubs, fakes, or you-name-it, there are no
guarantees. Each must be used with care. Nothing can be taken for-granted!

A tool being used at one 'layer' of the testing process cannot really be
expected to cross-layers - even if some 'higher' layer of testing uses
the same tool. How one layer is tested is quite different to the
objectives of testing an higher/lower layer!

NB I don't see one of these as 'the tool to rule them all'. Each has its
place - use 'the best tool for the job'.

A number of references/allusions have been included here, because I know
the OP likes to 'read around' and consider issues wider than mere syntax.

Having survived this far, if you enjoy our favorite language's
occasional allusion to the Monty Python series, you will likely also
enjoy Terry Pratchett's (fictional) writings. Herewith a couple of
pertinent quotes, the first about 'planning' (planning testing?), and
the second referencing our industry's habit of making great promises
(and assumptions) but being a little short on delivery against others'
(users') expectations (as per the article?):-

"Plan A hadn't worked. Plan B had failed.  Everything depended on Plan
C, and there was one drawback to this: he had only ever planned as far
as B."

"Crowley had been extremely impressed with the warranties offered by the
computer industry, and had in fact sent a bundle Below to the department
that drew up the Immortal Soul agreements, with a yellow memo form
attached just saying:‘Learn, guys.'"
(both from "Good Omens")
-- 
Regards,
=dn