How to make this faster

Fri Jul 5 23:05:30 EDT 2013

On Fri, 05 Jul 2013 18:39:15 +0000, Helmut Jarausch wrote:

> On Fri, 05 Jul 2013 16:50:41 +0000, Steven D'Aprano wrote:
> 
>> On Fri, 05 Jul 2013 16:07:03 +0000, Helmut Jarausch wrote:
>> 
>>> The solution above take 0.79 seconds (mean of 100 calls) while the
>>> following version take 1.05 seconds (mean of 100 calls):
>> 
>> 1) How are you timing the calls?
> 
> I've put the work horse "Solve" into a loop executing 100 times. That's
> on an otherwise idle Linux machine.

That doesn't explain how you time it, only that you have a loop executing 
100 times. Are you using time.time, or time.clock? (I trust you're not 
measuring times by hand with a stop watch.)

I expect you're probably doing something like this:

start = time.time()
for i in range(100):
    solve(data)
t = time.time() - start
print "time taken", t/100

In this case, you're getting times well over a second, so the overhead is 
unlikely to be significant. E.g. the time it takes to create range(100) 
will be insignificant compared to the time it takes to run solve. 
Nevertheless, as a matter of good practice, I recommend that you use the 
timeit module, as it's carefully written to minimize the overhead when 
timing small code snippets where the overhead is *not* insignificant. 

http://docs.python.org/2/library/timeit.html
http://docs.python.org/3/library/timeit.html

For longer running code, like this, you might also like this:

http://code.activestate.com/recipes/577896/

>> 2) Don't use the mean, that's the wrong statistic when you are
>> measuring something where the error is always one sided. You want the
>> minimum, not the mean.
> 
> Here you assume time measuring itself is without error - I doubt that.

No, I don't. I may have glossed over all the sources of measurement 
error, but they are all *positive* which is the important thing. The time 
measured will never be *less* than the actual time taken. So regardless 
of the source of measurement error, the way to minimize it is to only 
look at the minimum timing, not the mean.

> If the timing version, which executes function "Solve" one hundred
> times, runs about 80-100 seconds without a significant variation, then
> taking the mean is mathematically correct.

If the best you can say is it takes "80-100 seconds", that's pretty 
significant variation, of the order of 20%.

In this case, with times of the order of a second per loop, it may be 
reasonable to say "in this specific case, the error is too small to care 
about", or "I just don't care about the error, since it will be about the 
same for different variations of my solve function". But in that case, 
why bother looping 100 times?

> I can't take the minimum
> since I don't measure the time a single call takes.

Then perhaps you should.

-- 
Steven