unicode() vs. s.decode()

Sun Aug 9 06:01:25 EDT 2009

On Sat, 08 Aug 2009 19:00:11 +0200, Thorsten Kampe wrote:

>> I was running it one million times to mitigate influences on the timing
>> by other background processes which is a common technique when
>> benchmarking.
> 
> Err, no. That is what "repeat" is for and it defaults to 3 ("This means
> that other processes running on the same computer may interfere with the
> timing. The best thing to do when accurate timing is necessary is to
> repeat the timing a few times and use the best time. [...] the default
> of 3 repetitions is probably enough in most cases.")

It's useful to look at the timeit module to see what the author(s) think.

Let's start with the repeat() method. In the Timer docstring:

"The repeat() method is a convenience to call timeit() multiple times and 
return a list of results."

and the repeat() method's own docstring:

"This is a convenience function that calls the timeit() repeatedly, 
returning a list of results.  The first argument specifies how many times 
to call timeit(), defaulting to 3; the second argument specifies the 
timer argument, defaulting to one million."

So it's quite obvious that the module author(s), and possibly even Tim 
Peters himself, consider repeat() to be a mere convenience method. 
There's nothing you can do with repeat() that can't be done with the 
timeit() method itself.

Notice that both repeat() and timeit() methods take an argument to 
specify how many times to execute the code snippet. Why not just execute 
it once? The module doesn't say, but the answer is a basic measurement 
technique: if your clock is accurate to (say) a millisecond, and you 
measure a single event as taking a millisecond, then your relative error 
is roughly 100%. But if you time 1000 events, and measure the total time 
as 1 second, the relative error is now 0.1%.

The authors of the timeit module obvious considered this an important 
factor: not only did they allow you to specify the number of times to 
execute the code snippet (defaulting to one million, not to one) but they 
had this to say:

[quote]
Command line usage:
    python timeit.py [-n N] [-r N] [-s S] [-t] [-c] [-h] [statement]

Options:
  -n/--number N: how many times to execute 'statement'
 [...]

If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
[end quote]

In other words, when calling the timeit module from the command line, by 
default it will choose a value for n that gives a sufficiently small 
relative error.

It's not an accident that timeit gives you two "count" parameters: the 
number of times to execute the code snippet per timing, and the number of 
timings. They control (partly) for different sources of error.

-- 
Steven