Premature wakeup of time.sleep()

Tue Sep 13 15:58:38 EDT 2005

On Mon, 12 Sep 2005 19:33:07 -0400, Peter Hansen <peter at engcorp.com> wrote:

>Steve Horsley wrote:
>> I think the sleep times are quantised to the granularity of the system 
>> clock, shich varies from os to os. From memory, windows 95 has a 55mS 
>> timer, NT is less (19mS?), Linux and solaris 1mS. All this is from 
>
>For the record, the correct value for NT/XP family is about 15.6 ms 
>(possibly exactly 64 per second and thus 15.625ms, but offhand I don't 
>recall, though I'm sure Google does).
>
What "correct value" are you referring to? The clock chip interrupts,
clock chip resolution, or OS scheduling/dispatching?

For NT4 at least, I don't know of anything near 15.6 ms ;-)

Speaking generally (based on what I've done myself implementing
time-triggered wakeups of suspended tasks in a custom kernel eons ago),
a sleep-a-while function is not as desirable as a sleep-until function,
because the first is relative and needs to get current time and
add the delta and then use the sleep-until functionality. So if there
is a timing glitch during the sleep-a-while call, there will be a glitch
in the wake-up time.

Once you have a wakeup time, you can put your sleeper in a time-ordered queue,
at which time you might or might not check if wake-up time is already past,
which could happen. So you can put the task in a ready queue for the OS to
run again when it gets around to it, or since the sleeper is the caller, you
have the option of a hot return as if the sleep call hadn't happened.

Assuming you queue the task, when is the queue going to be checked to see
if it's time to wake anyone up? Certainly when the next OS time slice is
up. That might be ~55ms or 10ms (NT4) or 1ms (modern OS & processor). At that
point (roughly on the strike of e.g. 10ms, depending on how many bad high-priority
interrupt service routines there are that glitch it by suspending interrupts
too long or queueing too much super-priority deferred processing)), we look at
the wake-up queue. And a number of tasks might get wakened. Some will have had
wake-up times that were really just microseconds after they were queued, but it
has taken 10ms (or whatever) for the OS to finish another task's time slice and
check, so the effect is to wake up on the OS basic slicing time. Waking up is
only being queued for resumption however, so there might be more delay, depending
on relative priorities etc. Some OS's might look at the time queue at every opportunity,
meaning any time there is any interrupt. The check can be fast, since it is just
a comparison of current time vs the first wakeup time (with the right hardware,
this check can be done in the hardware, and it can interrupt when a specific
wakeup time passes. (You'd think CPUs would have this built in by now)).
But in any case, waking up is only being put in the ready-to-resume
queue, not necessarily being run immediately, so you have variability in the timing.

Note that when you wake up on the OS time-slicing interval edge, you are synchronized
with that, and if a number of tasks get readied at the "same" time, they will run
in succession. Sometimes a pattern of succession will be stable for a long time
due to the order of calls and the structure of queues etc., so that there may be
stable but different relative delays observed in the multiple tasks.

Add in adaptive priority tuning for foreground/background tasks, etc., and you
can see why sleep results could be less predictable as you move away from a
dedicated-machine context for your program.

UIAM the clock chip commonly used since the early PC days derives from IBM's
decision to buy a commodity crystal-based oscillator from some CRT/TV context
and divide it down rather than spend more to spec something with a nice
10**something hz frequency that most programmers would likely have preferred.

For a PC, the linux kernel sources will have a constant "CLOCK_TICK_RATE" e.g.,
see
    http://lxr.linux.no/source/include/asm-i386/timex.h

where it has

12 #ifdef CONFIG_X86_ELAN
13 #  define CLOCK_TICK_RATE 1189200 /* AMD Elan has different frequency! */
14 #else
15 #  define CLOCK_TICK_RATE 1193182 /* Underlying HZ */
16 #endif

The second define is typical for i386 but was apparently revised
from the rounder number 1193180 at some point. IIRC the number in my old
compaq (16mhz 386 back then ;-) bios spec technical docs ended in zero ;-)

Anyway, if you run that into a 16-bit counter and trigger an interrupt every
time the counter turns over, you get the old familiar 18.206512 hz (1193182/2**16)
or 54.9254mz tick of DOS and early windows. Or you could trigger at some other count
for other size ticks.

BTW, the CMOS clock that keeps time by battery power while your PC power is off
is a separate thing, which has a resolution of 1 second, so even though the OS
may keep time with the timer chip, its counting start time when it boots
is on a second, and could be a second off even if the CMOS clock was dead
accurate, unless the OS goes to the net for a more accurate time. Usually
drift swamps the second resolution in short order though (a problem for
networked parallel makes BTW?)

Enough rambling ...

Regards,
Bengt Richter