timeit.timeit and timeit.repeat give different answers

Tue Jul 13 20:27:04 EDT 2004

Peter Otten <__peter__ at web.de> writes:

[lots of noisy data]
>
> And now what? 
>
> I'd just stick with the commandline's "best of N" strategy. 

As I tried to make clear, this isn't a question of noisy data.  The
two functions produce roughly the same numbers as themselves, but are
consistently in disagreement.  In more detail:

jdc at itchy:~$ unset PYTHONPATH 
jdc at itchy:~$ python
Python 2.3.4 (#2, Jun 19 2004, 18:15:30) 
[GCC 3.3.4 (Debian)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.Timer()

# A t.timeit group:

>>> t.timeit()
0.088190078735351562
>>> t.timeit()
0.08897709846496582
>>> t.timeit()
0.087677955627441406
>>> t.timeit()
0.089442968368530273

# A t.repeat group:

>>> t.repeat(1)
[0.073434114456176758]
>>> t.repeat(1)
[0.073648929595947266]
>>> t.repeat(1)
[0.074430942535400391]
>>> t.repeat(1)
[0.07358098030090332]
>>> t.repeat(1)
[0.073504924774169922]

# t.timeit in a for loop:

>>> for i in range(10):
...     print t.timeit()
... 
0.0822348594666
0.0836551189423
0.0827491283417
0.0818839073181
0.0807418823242
0.0863690376282
0.0951108932495
0.0798268318176
0.080157995224
0.081778049469

# t.repeat with 10 outputs (linebreaks added):

>>> t.repeat(10)
[0.074424982070922852, 
0.074212789535522461, 
0.074655055999755859, 
0.074471950531005859, 
0.074034929275512695, 
0.074390172958374023, 
0.074491024017333984, 
0.074390172958374023, 
0.074035882949829102, 
0.076071023941040039]

There is definitely something fishy going on here.

> Also, I would
> expect the "pass" statement to be the fastest to execute and therefore the
> least accurate to measure.

The problem happens with more complicated tests as well, as long as
they aren't too complicated.  But you are right that for anything
significant, the noise dominates.  

I'd still like to understand why this happens.  The t.repeat method is
very simple:

        r = []
        for i in range(repeat):
            t = self.timeit(number)
            r.append(t)

Hmm, if I type this at the python prompt:

r = []
for i in range(1):  # Note: only one loop!
    ti = t.timeit()
    r.append(ti)
print r

I get data that agrees with t.repeat but disagrees with:

for i in range(1):
    print t.timeit()

or

t.timeit()

or

ti=t.timeit()
print ti

The last three are *slower* than the one that uses append.

This happens with timeit in 2.4 too, which disables garbage collection
during the timing, so I don't see why the two methods should produce
different answers.

Further clues:

>>> def doit():  
...     print t.timeit()
... 
>>> doit()
0.0791070461273
>>> doit()
0.0793399810791
>>> doit()
0.0791549682617
>>> doit()
0.0794830322266
...etc...

>>> t.timeit()
0.086545944213867188
>>> t.timeit()
0.085171937942504883
>>> t.timeit()
0.089100122451782227
>>> t.timeit()
0.089087963104248047

Could it have something to do with the namespace in effect?

Dan