Big-O notation (was: Re: Python and Schools)

Tue Apr 15 20:16:09 EDT 2003

On Tue, 2003-04-15 at 15:58, Roy Smith wrote:

> We've got something which scans a directory (an O(N) process).  It's 
> supposed to happen once every time the program is run.  Somebody goofed 
> in coding, and moved it into the loop that happens once every input 
> file, not once per run.  Now it's an O(N^2) process.
> 
> Nobody noticed it in testing because we only tested it with maybe 100 
> files.  Then one of our customers ran it with 25,000 files (a perfectly 
> reasonable thing to do).  Whoops :-)

I have unit tests that actually try to detect these things using
regressions and t-tests.  For example, I have an implementation of a
numerical algorithm that I know is supposed to be O(N).  In my tests, I
time the result of the algorithm with varying input sizes (ie. different
N), and do a quadratic regression (like a linear regression, but with a
term for x**2).

My assumption is that the x**2 term should have a coefficient near zero
(ie. zero "curve" to the regression).  I then use a t-test to verify
this assumption, using a few runs on random data.  Set the confidence
level fairly low, and it seems to do a good job of detecting if I
accidentally make a O(N**2) algorithm, rather than the expected O(N)
(with very few false positives).

Fairly quick and easy to implement (given existing t-test and regression
toolks available in stats.py and Numeric)

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)