Benchmarking some modules - strange result

Sat Jan 24 21:11:43 EST 2015

Hi folks.

I've been benchmarking some python modules that are mostly variations
on the same theme.

For simplicity, let's say I've been running the suite of performance
tests within a single interpreter - so I test one module thoroughly,
then move on to the next without exiting the interpreter.

I'm finding that if I prune the list of modules down to just the best
performers, I get pretty different results - what was best no longer
is.  This strikes me as strange.

I'm forcing a garbage collection between tests with gc.collect(), but
even that yields the oddball results.

Is there something else I should do to restore the interpreter to a
known-pristine state for the sake of such tests?

BTW, there isn't much else going on on this computer, except some
possible small cronjobs.

I'm about ready to rewrite things to run each individual test in a
fresh interpreter. But is there a better way?

If I do the rewrite, I might run 1 run of module A, 1 run of module B,
1 run of module C, then another run of module A, another run of module
B, and another run of module C - to spread any possible timing
oddities more evenly.

Thanks.