object references/memory access

Tue Jul 3 14:53:44 EDT 2007

On Jul 2, 10:57 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> >>> I have found the stop-and-go between two processes on the same machine
> >>> leads to very poor throughput. By stop-and-go, I mean the producer and
> >>> consumer are constantly getting on and off of the CPU since the pipe
> >>> gets full (or empty for consumer). Note that a producer can't run at
> >>> its top speed as the scheduler will pull it out since it's output pipe
> >>> got filled up.
>
> > On a single core CPU when only one process can be running, the
> > producer must get off the CPU so that the consumer may start the
> > draining process.
>
> It's still not clear why you say that the producer can run "at its top
> speed". You seem to be suggesting that in such a setup, the CPU would
> be idle, i.e. not 100% loaded. Assuming that the consumer won't block
> for something else, then both processes will run at their "top speed".
> Of course, for two processes running at a single CPU, the top speed
> won't be the MIPs of a single processor, as they have to share the CPU.
>
> So when you say it leads to very poor throughput, I ask: compared
> to what alternative?

Let's assume two processes P and C. P is the producer of data; C, the
consumer.
To answer your specific question, compared to running P to completion
and then running C to completion. The less optimal way is p1-->c1--
>p2-->c2-->..... p_n---c_n where p1 is a time-slice when P is on CPU,
c1 is a time-slice when c1 is on CPU.

If the problem does not require two way communication, which is
typical of a producer-consumer, it is a lot faster to allow P to fully
run before C is started.

If P and C are tied using a pipe, in most linux like OS (QNX may be
doing something really smart as noted by John Nagle), there is a big
cost of scheduler swapping P and C constantly to use the CPU. You may
ask why? because the data flowing between P and C, has a small finite
space (the buffer). Once P fills it; it will block -- the scheduler
sees C is runnable and puts C on the CPU.

Thus even if CPU is 100% busy, useful work is not 100%; the process
swap overhead can kill the performance.

When we use an intermediate file to capture the data, we allow P to
run a lot bigger time-slice. Assuming huge file-system buffering, it's
very much possible P gets one-go on the CPU and finishes it's job of
data generation.

Note that all these become invalid, if you have a more than one core
and the scheduler can keep both P and C using two cores
simulateanously. If that is the case, we don't incur this process-swap
overhead and we may not see the stop-n-go performance drop.

Thanks,
Karthik

>
> Regards,
> Martin