is there better 32 clock() timing?

Wed Jan 26 03:11:38 EST 2005

On Tue, 25 Jan 2005 15:46:30 +0000, Stephen Kellett <snail at objmedia.demon.co.uk> wrote:

>In message <41f63d79.1778950866 at news.oz.net>, Bengt Richter 
><bokr at oz.net> writes
>>I believe that is quite wrong as a general statement.
>
>Actually my initial statement should have been written
>"accessing a resource, the accuracy of which is no better than 10ms.". I 
>was thinking of the 1ms multimedia timer but wrote about clock() 
>instead.
>
>10ms, coincidentally is the approx minimum scheduling granularity for 
>threads unless you are in a multimedia thread (or real time thread - not 
>sure about real time threads in NT).
>
>>If the "resource" only had ~1ms granularity,
>>the minimum would be zero, as it is if you call time.time() in a tight loop,
>
>Correct. Write your app in C and call clock(). Thats what you get. You 
>can call clock 20000 times and still get a delta of zero. The next delta 
>(on my system) is 10ms at about 22000 calls.
>
>>>There are various timers available, documented and undocumented, all of
>>>which end up at 1ms or 1.1ms, give or take. For anything shorter you
>
>Whoops here we go, same typo - should have been 10ms or 11ms. There is a 
>1ms timer in the multimedia timing group.
>
>>>need QueryPerformanceCounter() (but that *is* a slow call), or use the
>>Have you timed it, to make that claim?
>
>Yes.
>
>>What do you mean by "slow"?
>
>Slower than any other Win32, CRT or Undocumented NT function you can use 
>to get timer information. Yes, I have timed them all, a few years ago.
>
>QueryPerformanceCounter is 47 times slower to call than clock() on my 
>1Ghz Athlon.
That really makes me wonder. Perhaps the Athlon handles RDTSC by way of
an illegal instruction trap and faking the pentium instruction? That might
explain the terrible timing. Try it on a pentium that supports RDTSC.
The clock() usually gets high resolution bits from a low order 16 bits
of the timer chip that drives the old 55ms clock that came from IBM
using cheap TV crystal based oscillators instead of defining an
OS-implementer-friendly time base, I think. The frequency was nominally
1193182 hz I believe. Obviously the OS didn't get interrupted that often,
but if you divide by 2**16, you get the traditional OS tick of ~55ms:

 >>> 1193182./2**16
 18.206512451171875
 >>> (1193182./2**16)**-1
 0.054925401154224583

So that's a clue. By grabbing tick count and bits read from the fast-counting
harware clock register, you can compute time fairly accurately for the moment
you are sampling that register. IIRC, you couldn't get the whole 16 bits because
it was a toggling trigger or some such.

>
>QueryPerformanceCounter may have finer granularity, but called in a 
>tight loop it'll crush your program.
Maybe on your Athlon, but my experience is different ;-)

>
>>>RDTSC instruction which is fast but gives a count of instruction cycles
>>>executed and is thus not totally accurate (multiple execution pipelines,
>>>plus multithreading considerations).
>>Accurate for what.
>
>See below - you haven't taken things into account, despite my comment in 
>brackets above which gives a big hint.
I've absorbed a lot of hints since around '59 when I began to work with
computers and timing issues ;-)

>
>>A single clock AFAIK drives RDTSC
>
>Correct.
>
>>The main problem with a CPU clock based reading is that it's very stable unless
>>there's variable clock rate due to power management.
>
>Try running multiple apps at the same time you are doing your 
>measurement, each of which has a variable loading. Each of these apps is 
>contributing to the count returned by RDTSC. That is what I was 
>referring to.

Ok, but that's another issue, which I also attempted to draw attention to ;-)

Quoting myself:
"""
>
>Even with the attribute lookup overhead, it's not several hundred microseconds
>as a *minimum*. But on e.g. win32 you can get preempted for a number of milliseconds.
>E.g., turn that to a max instead of a min:
>
>I see a couple 20-30 ms ones ;-/
>
> >>> max(abs(time.clock()-time.clock()) for i in xrange(10**5))
> 0.0085142082264155761
> >>> max(abs(time.clock()-time.clock()) for i in xrange(10**5))
> 0.0088125700856949152
> >>> max(abs(time.clock()-time.clock()) for i in xrange(10**5))
> 0.0022125710913769581
> >>> max(abs(clock()-clock()) for i in xrange(10**5))
> 0.023374472628631793
> >>> max(abs(clock()-clock()) for i in xrange(10**5))
> 0.030183995400534513
> >>> max(abs(clock()-clock()) for i in xrange(10**5))
> 0.0017130664056139722
> >>> max(abs(clock()-clock()) for i in xrange(10**5))
> 0.0070844179680875641
>
"""

Depending on what your timing requirements are, you may be able to run
a zillion trials and throw out the bad data (or plot it and figure out
some interesting things about timing behavior of your system due to various
effects). E.g., timeit presumably tries to get a minimum and eliminate as
many glitches as possible.

But as mentioned, the big picture of requirements was not clear. Certainly
you can't expect to control ignition of a racing engine reliably with
an ordinary windows based program ;-)

Regards,
Bengt Richter