[python-committers] Codecov and PR

Thu Apr 27 15:44:08 EDT 2017

On Wed, 26 Apr 2017 at 22:36 Terry Reedy <tjreedy at udel.edu> wrote:

> On 4/26/2017 1:45 PM, Brett Cannon wrote:
> >
> > On Tue, 25 Apr 2017 at 17:00 Terry Reedy <tjreedy at udel.edu
> > <mailto:tjreedy at udel.edu>> wrote:
>
> >     While I use code coverage to improve automated unittesting, I am
> opposed
> >     to turning a usable but limited and sometime faulty tool into a blind
> >     robotic master that blocks improvements.  The prospect of this being
> >     done has discouraged me from learning the new system.  (More on
> 'faulty
> >     tool' later.)
> >
> > It should be stated that code coverage is not a blocking status check
> > for merging from our perspective (the **only** required check is that
> > Travis pass with it's test run).
>
> I have the impression that at one time you hoped to make it blocking.
> If that was wrong, I apologize for misunderstanding.  If you have
> changed your mind, then I am happy.
>

"Hope", sure; "currently plan to", no. IOW I have said a lot of things over
the 2.5 years I've been leading this workflow shift and I don't expect all
of them to pan out.

>
> I am otherwise in favor of both the measurement and report of coverage
> being improved.
>
> >     The temptation to write artificial tests to satisfy an artificial
> goal
> >     is real.  Doing so can eat valuable time better used for something
> else.
> >        For instance:
> >
> >           def meth(self, arg):
> >               mod.inst.meth(arg, True, ob=self, kw='cut')
> >
> >     Mocking mod.class.meth, calling meth, and checking that the mock is
> >     called will satisfy the robot, but does not contribute much to the
> goal
> >     of providing a language that people can use to solve problems.
>
> > My assumption is that there will be a test that meth() does the right
> > thing, mock or not. If we provide an API there should be some test for
> > it; using a mock or something else to do the test is an implementation
> > detail.
>
> My impression is that default mocks have a generic signature, so that
> merely checking that the mock is called will not catch an invalid call.
> I presume that one can do better with mocks, and I have with custom
> mocks I have written, but my point above was that coverage does not care.
>
> >      >> If it's not important enough to require tests >> it's not
> >     important enough to be in Python.  ;)
> >
> >     Modules in the test package are mostly not tested. ;)
> >
> >
> > :) But they are at least executed which is what we're really measuring
> > here and I think all Ethan and I are advocating for.
>
> I thought Ethan was advocating for more -- a specific unittest for each
> line.
>
> > E.g. I don't expect
> > test_importlib to be directly responsible for exercising all code in
> > importlib, just that Python's entire test suite exercise importlib as
> > much as possible as a whole.
>
> The advantage for importlib in this respect is that import statements
> cannot be mocked; only the objects imported, after importlib is finished.
>

Oh, you can mock import statements. :)

>
> There is lots of interaction between idlelib modules, but I would still
> like direct tests of each idlelib.xyz with a test_xyz.py.  Three years
> ago, there was no test.test_idle.  There now is, and it runs 35
> idlelib.idle_test.text_* files.  (There are 60 idlelib modules.)
>
> > The problem I have with just doing manual testing for things that can
> > easily be covered by a unit test -- directly or indirectly -- is there
> > are technically 85 people who can change CPython today. That means
> > there's potentially 85 different people who can screw up the code ;) .
>
> At the moment, I am the only one pushing idlelib patches, except when it
> gets included in one of Serhiy's multi-module refactoring patches (and
> he always nosies me).  That skews my view a bit.  However, with most of
> the critical issues fixed, I am anxious to automate what I can of what I
> now do by hand.
>
> > Making sure code is exercised somehow by tests at least minimizes how
> > badly someone like me might mess something thanks to me not realizing I
> > broke the code.
>
> I had not thought about the issue that way.  I should add a test_module
> for each remaining module, import the module, and at least create an
> instance of every tkinter widget defined therein, and see what other
> classes could be easily instantiated and what functions easily run.
>

That seems like a good starting point. Kind of like test_sundry but with
class instantiation on top of it.

> >     Some practical issues with coverage and CodeCov:
>
> >     2. Some statements are only intended to run on certain systems,
> making
> >     100% coverage impossible unless one carefully puts all
> system-specific
> >     code in "if system == 'xyz'" statements and uses system-specific
> >     .coveragerc files to exclude code for 'other' systems.
>
> > True. We could have a discussion as to whether we want to use
> > Coverage.py's pragmas ... I'm sure we could discuss things with Ned
> > Batchelder if we needed some functionality in coverage.py for our needs).
>
> Let's skip this for now.
>
> >     3. Some tests required extended resources.  Statements that are only
> >     covered by such tests will be seen as uncovered when coverage is run
> on
> >     a system lacking the resources.  As far as I know, all non-Windows
> >     buildbots and CodeCov are run on systems lacking the 'gui'
> resource.  So
> >     patches to gui code will be seen as uncovered.
> >
> > I view 100% coverage as aspirational, not attainable. But if we want an
> > attainable goal, what should we aim for? We're at 83.44% now
>
> On what system?

Travis, where the Codecov run is driven from.

>   I suspect that Tkinter, ttk, turtle, and IDLE
> GUI-dependent tests make at least a 2% difference on GUI Windows versus
> no-GUI *nix.

>    we could
> > say that 80% is something we never want to drop below and be done with
> > it. We could up it to 85% or 90% in recognizing that there is more
> > testing to do but that we will never cover all Python code (all of this
> > is configurable in Codecov, hence why I'm asking).
>
> Since I think we actually are at 85%, and certainly will be when I add
> minimal easy tests for the rest of IDLELIB, I think 90% would be a
> reasonable goal.
>

Seems reasonable to me. Opened
https://github.com/python/core-workflow/issues/75 to remind me to tweak
Codecov's config so that we are aiming for 90% overall.

-Brett

>
> One way to increase coverage is to push a bit harder on fulfilling the
> 'test needed' stage.  Theoretically, every substantive
> (behavior-changing) patch should start with a test that fails.  Since
> PRs are separate from the main repository and can be patched separately,
> a PR could start with a test that should immediately fail but should
> pass before merging.  It would be nice if the test runner knew to only
> run the new test and not the entire suite.  It would be even nicer if it
> know that initial failure is success.  Is there at least a 'New Test'
> label on PRs?
>
> >     4. As I explained in a post on the core-workflow list, IDLE needs the
> >     following added to the 'exclude_lines' item of .coveragerc.
> >           .*# htest #
> >           if not _utest:
>
> These additions would remove, I think, at least 400 lines from the
> uncovered category.  Both only occur in idlelib.
>
> --
> Terry Jan Reedy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-committers/attachments/20170427/e8ccfa6f/attachment.html>