[py-dev] utest thoughts

Fri Oct 1 20:05:32 CEST 2004

I thought I'd split this up, but most of it comes down to the same 
subject -- how to find tests, how to annotate tests, how to select 
tests, and those are all kind of the same problem.  Well, that and some 
more minor details...

holger krekel wrote:
> Yes, selecting test methods by (maybe wildcarded) expressions seems
> reasonable. However, i would like to first implement an 
> "stateful testing" approach which would basically work like this: 
> 
>     py.test --session 
>     ... runs all tests, some of which fail ... 
>    
>     py.test --session 
>     ... runs only the previously failed tests ... 
> 
>     repeat the last step until all tests pass, 
>     then run all tests --> go to step 1
> 
> I somehow think that this might satisfy many use cases which otherwise
> would require some wildcard based selection scheme. 

That would definitely be a nice feature.  There are still some use cases 
for wildcards.  One would be when you are in a large package, and make a 
localized change, and you don't want to spend the time to run all the 
tests before you get to the code you just changed; maybe you'd run all 
the tests later, but you want to start with specific tests and then 
expand once those pass.

Another is TDD; in that case you may be writing many tests that are 
expected to fail.  You probably want to address the tests in a specific 
order, and not filter through all the other tests at the same time.

> py.test consists of three main elements when "running" tests: 
> the collector(s), the reporter (for producing all output), 
> and the runner, driving the collector and the reporter. Collectors 
> are completly decoupled from running tests. They are implemented 
> (std/utest/collect.py) via a concept called "Restartable Iterators", 
> thus the runner doesn't not need to know all the tests when starting
> to run them.  
> 
> I know that unittest.py approaches often just put all these 
> three elements together and call it a "runner" but i think 
> it blurs the lines and makes it very difficult to keep it
> extensible and customizable. 
> 
> Having said this, for having multiple data sets it's probably best
> to introduce a mechanism into the Module Collector to look for 
> custom module-defined collectors which would seamlessly integrate 
> into the collection process of Units.  By implementing your own 
> collection function you can yield many Units with different 
> test data sets which are then executed/accounted for individually. 

Could this be like:

data = [(1, 'one'), (2, 'two'), ...]
def test_collector():
     for args in data:
         # Depending on when tests are run, I think leaving out args=args
         # could silently make all your tests use the same data :(
         yield lambda args=args: test_converter(*args)
# Should we then to test_collector = test_collector() ?

# Maybe test_converter is a bad name, because the runner will find it...
# or if test_collector is present, maybe the runner won't look any
# further in this module; but sometimes we will want it to look through
# the module...
def test_converter(number, english):
     assert make_english(number) == english

That would be pleasingly simple.  I'm hoping we can avoid exposing 
complex interfaces to the test code.  It gets more complicated if you 
can address the tests individually, e.g., via wildcards or keywords. 
How to add that information to the individual tests?

>>* Specifying an option to the runner that gets passed through to the 
>>tests.  It seems like the options are fixed now.  I'd like to do 
>>something like -Ddatabase=mysql.  I can do this with environmental 
>>variables now, but that's a little crude.  It's easiest if it's just 
>>generic, like -D for compilers, but of course it would be nicer if there 
>>were specific options.  Maybe this could be best achieved by 
>>specializing utest and distributing my own runner with the project.
> 
> 
> I see the point.  Doing this should be done by reexaminig the current
> 'config' mechanism. Or better: rewriting it alltogether as it is very
> much ad hoc/preliminary.  There is a provision to have utest configuration 
> files currently called utest.conf where you can at the moment only 
> set some default values for command line options.  
> 
> with unittest.py there is a custom habit of simply replacing 
> the "runner" while with py.test it's most often better to write
> just a custom reporter or a custom collector. 
> 
> Eventually, the py.test-config file should allow to have specific 
> collectors, reporters and maybe even runners for subdirectories. 

Hmm... this would address certain issues.  For instance, if you're doing 
functional tests on a web app, you might configure what URL the app is 
locally installed at.

In the case I'm thinking of, where I run the identical set of tests on 
different backends, it should be available as a command-line argument. 
But if there's a generic command-line argument (like -D) then that could 
be used to set arbitrary options (assuming the config file can accept 
arbitrary options).

> Architecting such a federated scheme of collectors/runners/reporters 
> is not easy but i think very much worth it.  
> 
> Note, that it should always be possible to run tests of any
> application with py-tests by invoking 'py.test APP-DIRECTORY' or 
> simply 'py.test' while being in the app directory.  

What about "python setup.py test" ?  This would allow for a common way 
to invoke tests, regardless of runner; people could use this for their 
unittest-based tests as well, or whatever they are using.

I think I tried this at one point, but then got bored of trying to 
figure out the distutils just to add this one little command.

> At some point we 
> may look into providing direct support for unittest.py style tests
> to allow a seemless "upgrade". But this may be extremely messy 
> with all those unittest.py hacks around. 
> 
>>* I'm not clear how doctest would fit in.  I guess I could turn the 
>>doctest into a unit test TestCase, then test that.  Of course, it would 
>>be nice if this was streamlined.  I also have fiddled with doctest to 
>>use my own comparison functions when testing if we get the expected 
>>output.  That's not really in the scope of utest -- that should really 
>>go in doctest.  Anyway, I thought I'd note its existance.
> 
> 
> Yes, well noted.  I think everyone agrees that integrating doctest 
> is a good idea. I am just not using them currently but i like the
> idea.  It's open how to integrate doctests into py.test.  I guess
> the rough idea is to to extend the current collectors to look for 
> docstrings and they would generate specific Units whose execute() 
> method is invoked by our runner. The DoctestUnit.execute method 
> would run a doctest.  Probably, this also requires extending 
> the TextReporter to support some nice kind of failure output. 

In Zope3 they explicitly add doctests to the testing, it isn't just 
automatically picked up.  In part because the doctests are typically in 
modules that aren't otherwise inspected for tests (they are inline with 
the normal code, not in seperate test_* modules).  I think there may be 
a performance issue with inspecting all modules for doctests, and 
potentially an issue of finding things that look like tests but aren't 
(though that probably isn't a big problem, since there's ways to exclude 
docstrings from doctest).

>>* Code coverage tracking.  This should be fairly straight-forward to add.
> 
> 
> yes.  I guess everybody uses sys.settrace which is
> unfortunately rather expensive. Anyway, it would be nice to
> come up with some real life use cases and see what is really
> useful. I am not clear on that. 

I think the "50% code coverage" is mostly a feel-good measure, so you 
can be pleased with your increasing score as you add tests.  It would be 
awesome to allow for leveling up with your tests.  "20% code coverage; 
you have begun your travels" or "95% code coverage; you are approaching 
enlightenment".

The actual file-by-file reports are more useful, I think.  Coverage 
should only be tracked when explicitly asked for.

>>* Different levels of tests (-a --at-level or --all; default level is 1, 
>>which doesn't run all tests).  They have lots of tests, so I'm guessing 
>>they like to avoid running tests which are unlikely to fail.
> 
> 
> having different levels of tests seems interesting. I'd like
> more of a keyword based approach where all tests are
> associated with some keywords and you can select tests by
> providing keywords.  Keywords could be automatically
> associated from filename and python name components e.g.
> ['std', 'path', 'local', 'test_listdir'].  You could
> additionally associate a 'slow' keyword to some tests (somehow) 
> and start 'py.test --exclude="slow"' or put this as a default in
> the configuration file. 

That would work well, I think.  How might tests be annotated?  On a 
module-by-module basis, using some particular symbol 
(__test_keywords__)?  Function attributes?  On an ad hoc basis by 
customizing the collector?

It's actually the kind of place where adaptation would be interesting; 
objects would be adapted to test cases, where part of the test case API 
was a set of keywords.  That would allow for a lot of customization, 
while the actual tests could remain fairly simple.  Part of the base of 
py.test would be adapters for packages, modules, and functions; the 
module adapter looks for the test_* functions, function adapters might 
look for function attributes, etc.  There'd be another adapter for 
unittest.TestCase and unittest.TestSuite, and so on.  Packages could 
create their own adapters for further customization.

>>* A distinction between unit and functional tests (as in acceptance or 
>>system tests).  This doesn't seem very generic -- these definitions are 
>>very loose and not well agreed upon.  There's not even any common 
>>language for them.  I'm not sure how this fits in with level, but some 
>>sort of internal categorization of tests seems useful.
> 
> 
> maybe use the keyword approach for this, too? 

Seems like a good use case.

>>* A whole build process.  I think they run out of the build/ directory 
>>that distutils generates.  It is a little unclear how paths work out 
>>with utest, depending on where you run it from.  Setting PYTHONPATH to 
>>include your development code seems the easiest way to resolve these 
>>issues with utest.  I don't have anything with complicated builds, so 
>>maybe there's issues I'm unaware of.
> 
> 
> Simply providing a distutils-install should pose no problem and
> we should do it, however there are a couple of interesting issues here: 
 >
>     - armin and me want a mechanism by which to include '.c'
>       files in the library and seemlessly compiling (and 
>       possibly caching) them via distutils-mechanism.  This
>       should allow to work with a svn-checkout containing 
>       c-coded modules without any explicit user interaction 
>       and especially without distutils-installing it. 

Right, this is what Zope is doing.  It builds the package (but does not 
install it) before running the tests (python setup.py build).  Then it 
runs the tests out of the build/ directory.  The build is, I think, 
relatively fast (after the first time it is run); it only updates things 
according to timestamps.

>     - managing the py library versions in the long run, i'd 
>       like to have an easy automated way for users of the py lib
>       to get/install a new version.  preferably via a 'py' binary
>       which also allows to ask 'py --version', 'py --selftest' 
>       to let the py-lib tests run, 'py --test' which would iterate 
>       into all modules/packages to find tests and run them. This is 
>       kind of integrity test for your system. Also  'py someprogram.py' 
>       could start an (interactive) python interpreter allowing 
>       the script to choose/restrict it's version (on all platforms). 
> 
>     - eventually i'd like to think some more about the notion 
>       of a 'py' application which would be installable/manageable 
>       probably connecting to the PyPI project but including 
>       downloads. 

Is there a reason these separate concerns go together?  The last seems 
like a distutils enhancement.  Handling multiple versions... well, 
that's another issue that is pretty much unaddressed at this point, but 
I'm not sure

>
>>* Run tests in a loop (-L --loop).  Also for checking memory leaks. 
>>I've thought that running loops of tests in separate threads could also 
>>be a useful test, for code that actually was supposed to be used with 
>>threads.  That might be another place for specializing the runner.
> 
> 
> Yes, see above. Maybe 'py.test --session' could actually not return 
> until all tests pass and wait for changing files to order to try again. 
> This is really nice because you can just save from your editor and 
> see the tests running (at least if you are working in multiple windows 
> or on Xinerama like i do :-) 

I think for a text reporter this would lead to information overload.  In 
Zope I assume they'd only use this once all tests passed, as a way of 
exercising the C code.

>>* Keep bytecode (-k --keepbytocode).  Not interesting in itself, but it 
>>implies that they don't normally keep bytecode.  I expect this is to 
>>deal with code where the .py file has been deleted, but the .pyc file is 
>>still around.  I've wasted time because of that before, so I can imagine 
>>its usefulness.
> 
> 
> yes, there are some weird problems with py.test sometimes i haven't really
> looked into yet.  Dealing with the compiled code will even get more of an
> issue when we let the tests run via "execnet" (see my other post).  In this
> case the runner might invoke multiple python interpreters and once and
> run tests in parallel and it would be good to not simply write .pyc 
> files at the same places where the .py files live. Wasn't there some
> option to python proposed to explicitely control the location where 
> .pyc files are created? 

There was a PEP, and some (mostly positive) discussion, but it lost 
momentum and got lost.  PEP 304, I think: 
http://www.python.org/peps/pep-0304.html

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org