unit-profiling, similar to unit-testing

spartan.the spartan.the at gmail.com
Thu Nov 17 16:28:56 EST 2011


On Nov 17, 4:03 pm, Roy Smith <r... at panix.com> wrote:
> In article <kkuep8-nqd.... at satorlaser.homedns.org>,
>  Ulrich Eckhardt <ulrich.eckha... at dominolaser.com> wrote:
>
> > Yes, this is surely something that is necessary, in particular since
> > there are no clear success/failure outputs like for unit tests and they
> > require a human to interpret them.
>
> As much as possible, you want to automate things so no human
> intervention is required.
>
> For example, let's say you have a test which calls foo() and times how
> long it takes.  You've already mentioned that you run it N times and
> compute some basic (min, max, avg, sd) stats.  So far, so good.
>
> The next step is to do some kind of regression against past results.
> Once you've got a bunch of historical data, it should be possible to
> look at today's numbers and detect any significant change in performance.
>
> Much as I loathe the bureaucracy and religious fervor which has grown up
> around Six Sigma, it does have some good tools.  You might want to look
> into control charts (http://en.wikipedia.org/wiki/Control_chart).  You
> think you've got the test environment under control, do you?  Try
> plotting a month's worth of run times for a particular test on a control
> chart and see what it shows.
>
> Assuming your process really is under control, I would write scripts
> that did the following kinds of analysis:
>
> 1) For a given test, do a linear regression of run time vs date.  If the
> line has any significant positive slope, you want to investigate why.
>
> 2) You already mentioned, "I would even wonder if you can't verify the
> behaviour agains an expected Big-O complexity somehow".  Of course you
> can.  Run your test a bunch of times with different input sizes.  I
> would try something like a 1-2-5 progression over several decades (i.e.
> input sizes of 10, 20, 50, 100, 200, 500, 1000, etc)  You will have to
> figure out what an appropriate range is, and how to generate useful
> input sets.  Now, curve fit your performance numbers to various shape
> curves and see what correlation coefficient you get.
>
> All that being said, in my experience, nothing beats plotting your data
> and looking at it.

I strongly agree with Roy, here.

Ulrich, I recommend you to explore how google measures appengine's
health here: http://code.google.com/status/appengine.

Unit tests are inappropriate here; any single unit test can answer
PASS or FAIL, YES or NO. It can't answer the question "how much".
Unless you just want to use unit tests. Then any arguments here just
don't make sense.

I suggest:

1. Decide what you want to measure. Measure result must be a number in
range (0..100, -5..5), so you can plot them.
2. Write no-UI programs to get each number (measure) and write it to
CSV. Run each of them several times take away 1 worst and 1 best
result, and take and average number.
3. Collect the data for some period of time.
4. Plot those average number over time axis (it's easy with CSV
format).
5. Make sure you automate this process (batch files or so) so the plot
is generated automatically each hour or each day.

And then after a month you can decide if you want to divide your
number ranges into green-yellow-red zones. More often than not you may
find that your measures are so inaccurate and random that you can't
trust them. Then you'll either forget that or dive into math
(statistics). You have about 5% chances to succeed ;)



More information about the Python-list mailing list