[py-dev] utest thoughts

Fri Oct 1 18:44:12 CEST 2004

Hi Ian, hi everybody else, 

i have still not much time but i can't wait to reply anymore :-) 

please Armin, Jens-Uwe, everybody: if you feel anything is
mis-represented or you have completely different ideas, or you
just want to comment on it, go ahead! We may want to split
the mail into different threads, though, if discussing it further. 

[Ian Bicking Mon, Sep 27, 2004 at 06:58:44PM -0500]
> * Specify tests to run within a module.  The only way to select a module 
> that I see now is by filename.  Package name would also be nice. 
> Wildcards could also be useful, e.g., utest modulename.'*transaction*'. 
>  I think regular expressions are unnecessarily complex.  Maybe a 
> wildcard character other than * would be nice, to keep it from 
> conflicting with shell expansion.  A setting, or an optional alternative 
> character?  Maybe % (like in SQL).

Yes, selecting test methods by (maybe wildcarded) expressions seems
reasonable. However, i would like to first implement an 
"stateful testing" approach which would basically work like this: 

    py.test --session 
    ... runs all tests, some of which fail ... 

    py.test --session 
    ... runs only the previously failed tests ... 

    repeat the last step until all tests pass, 
    then run all tests --> go to step 1

I somehow think that this might satisfy many use cases which otherwise
would require some wildcard based selection scheme. 

> * Data-driven tests, where the same code is tested with many different 
> sets of data.  Naturally this is often done in a for loop, but it's 
> better if the data turns into multiple tests, each of which are 
> addressable.  There's something called a "unit" in there, I think, that 
> relates to this...?  But not the same thing as unittest; I think I saw 
> unittest compatibility code as well.
> 
> Anyway, with unittest I could provide values to the __init__, creating 
> multiple tests that differed only according to data, but then the runner 
> became fairly useless.  I'm hoping that be easier with utest.

py.test consists of three main elements when "running" tests: 
the collector(s), the reporter (for producing all output), 
and the runner, driving the collector and the reporter. Collectors 
are completly decoupled from running tests. They are implemented 
(std/utest/collect.py) via a concept called "Restartable Iterators", 
thus the runner doesn't not need to know all the tests when starting
to run them.  

I know that unittest.py approaches often just put all these 
three elements together and call it a "runner" but i think 
it blurs the lines and makes it very difficult to keep it
extensible and customizable. 

Having said this, for having multiple data sets it's probably best
to introduce a mechanism into the Module Collector to look for 
custom module-defined collectors which would seamlessly integrate 
into the collection process of Units.  By implementing your own 
collection function you can yield many Units with different 
test data sets which are then executed/accounted for individually. 

> * Specifying an option to the runner that gets passed through to the 
> tests.  It seems like the options are fixed now.  I'd like to do 
> something like -Ddatabase=mysql.  I can do this with environmental 
> variables now, but that's a little crude.  It's easiest if it's just 
> generic, like -D for compilers, but of course it would be nicer if there 
> were specific options.  Maybe this could be best achieved by 
> specializing utest and distributing my own runner with the project.

I see the point.  Doing this should be done by reexaminig the current
'config' mechanism. Or better: rewriting it alltogether as it is very
much ad hoc/preliminary.  There is a provision to have utest configuration 
files currently called utest.conf where you can at the moment only 
set some default values for command line options.  

with unittest.py there is a custom habit of simply replacing 
the "runner" while with py.test it's most often better to write
just a custom reporter or a custom collector. 

Eventually, the py.test-config file should allow to have specific 
collectors, reporters and maybe even runners for subdirectories. 

Architecting such a federated scheme of collectors/runners/reporters 
is not easy but i think very much worth it.  

Note, that it should always be possible to run tests of any
application with py-tests by invoking 'py.test APP-DIRECTORY' or 
simply 'py.test' while being in the app directory.  At some point we 
may look into providing direct support for unittest.py style tests
to allow a seemless "upgrade". But this may be extremely messy 
with all those unittest.py hacks around. 

> * I'm not clear how doctest would fit in.  I guess I could turn the 
> doctest into a unit test TestCase, then test that.  Of course, it would 
> be nice if this was streamlined.  I also have fiddled with doctest to 
> use my own comparison functions when testing if we get the expected 
> output.  That's not really in the scope of utest -- that should really 
> go in doctest.  Anyway, I thought I'd note its existance.

Yes, well noted.  I think everyone agrees that integrating doctest 
is a good idea. I am just not using them currently but i like the
idea.  It's open how to integrate doctests into py.test.  I guess
the rough idea is to to extend the current collectors to look for 
docstrings and they would generate specific Units whose execute() 
method is invoked by our runner. The DoctestUnit.execute method 
would run a doctest.  Probably, this also requires extending 
the TextReporter to support some nice kind of failure output. 

> * Code coverage tracking.  This should be fairly straight-forward to add.

yes.  I guess everybody uses sys.settrace which is
unfortunately rather expensive. Anyway, it would be nice to
come up with some real life use cases and see what is really
useful. I am not clear on that. 

> The last time I looked around at test runners, Zope3's seemed the best. 
>  Well, it would have been better if I could have gotten it to do 
> something.  But it *seemed* best.  Mining it for features:
> 
> * Different levels of tests (-a --at-level or --all; default level is 1, 
> which doesn't run all tests).  They have lots of tests, so I'm guessing 
> they like to avoid running tests which are unlikely to fail.

having different levels of tests seems interesting. I'd like
more of a keyword based approach where all tests are
associated with some keywords and you can select tests by
providing keywords.  Keywords could be automatically
associated from filename and python name components e.g.
['std', 'path', 'local', 'test_listdir'].  You could
additionally associate a 'slow' keyword to some tests (somehow) 
and start 'py.test --exclude="slow"' or put this as a default in
the configuration file. 

> * A distinction between unit and functional tests (as in acceptance or 
> system tests).  This doesn't seem very generic -- these definitions are 
> very loose and not well agreed upon.  There's not even any common 
> language for them.  I'm not sure how this fits in with level, but some 
> sort of internal categorization of tests seems useful.

maybe use the keyword approach for this, too? 

> * A whole build process.  I think they run out of the build/ directory 
> that distutils generates.  It is a little unclear how paths work out 
> with utest, depending on where you run it from.  Setting PYTHONPATH to 
> include your development code seems the easiest way to resolve these 
> issues with utest.  I don't have anything with complicated builds, so 
> maybe there's issues I'm unaware of.

Simply providing a distutils-install should pose no problem and
we should do it, however there are a couple of interesting issues here: 

    - armin and me want a mechanism by which to include '.c'
      files in the library and seemlessly compiling (and 
      possibly caching) them via distutils-mechanism.  This
      should allow to work with a svn-checkout containing 
      c-coded modules without any explicit user interaction 
      and especially without distutils-installing it. 

    - managing the py library versions in the long run, i'd 
      like to have an easy automated way for users of the py lib
      to get/install a new version.  preferably via a 'py' binary
      which also allows to ask 'py --version', 'py --selftest' 
      to let the py-lib tests run, 'py --test' which would iterate 
      into all modules/packages to find tests and run them. This is 
      kind of integrity test for your system. Also  'py someprogram.py' 
      could start an (interactive) python interpreter allowing 
      the script to choose/restrict it's version (on all platforms). 

    - eventually i'd like to think some more about the notion 
      of a 'py' application which would be installable/manageable 
      probably connecting to the PyPI project but including 
      downloads. 

> * A pychecker option. (-c --pychecker)

makes sense, although i am not using it. I have the suspicion
that it would yell at py.magic :-) 

> * A pdb option (-D --debug).  I was able to add this to utest with 
> fairly small modifications (at least, if I did it correctly).

yes, nice, thanks. 

> * An option to control garbage collection (-g --gc-threshold).  I guess 
> they encounter GC bugs sometimes.

i'll let Armin comment on this :-) 

> * Run tests in a loop (-L --loop).  Also for checking memory leaks. 
> I've thought that running loops of tests in separate threads could also 
> be a useful test, for code that actually was supposed to be used with 
> threads.  That might be another place for specializing the runner.

Yes, see above. Maybe 'py.test --session' could actually not return 
until all tests pass and wait for changing files to order to try again. 
This is really nice because you can just save from your editor and 
see the tests running (at least if you are working in multiple windows 
or on Xinerama like i do :-) 

> * Keep bytecode (-k --keepbytocode).  Not interesting in itself, but it 
> implies that they don't normally keep bytecode.  I expect this is to 
> deal with code where the .py file has been deleted, but the .pyc file is 
> still around.  I've wasted time because of that before, so I can imagine 
> its usefulness.

yes, there are some weird problems with py.test sometimes i haven't really
looked into yet.  Dealing with the compiled code will even get more of an
issue when we let the tests run via "execnet" (see my other post).  In this
case the runner might invoke multiple python interpreters and once and
run tests in parallel and it would be good to not simply write .pyc 
files at the same places where the .py files live. Wasn't there some
option to python proposed to explicitely control the location where 
.pyc files are created? 

> * Profiling (-P --profile).  Displays top 50 items, by time and # of calls.

yip. Especially since the hotshot API is slightly verbose to use. 

> * Report only first doctest failure (-1 
> --report-only-first-doctest-failure).

yip. 

> * Time the tests and show the slowest 50 tests (-t --top-fifty).  I 
> first thought this was just a bad way of doing profiling, but now that I 
> think about it this is to diagnose problems with the tests runnning slowly.

Yes! 

> That's all the interesting options, I think.  There's also options to 
> select which tests you display, but these seem too complex, while still 
> not all that powerful.

See my idea about federated collectors / reporters / runners.  If we 
get this right then such interesting options become very viable. 

OK, enough for now, i guess. 

Ian, thanks for coming to us and helping to move the py lib along! 
While i currently am the main driver i am very happy to share 
decisions, ideas and work, especially with knowledgable python
developers. 

cheers, 

    holger