Any clues to source of this delay?

Tue Aug 3 04:38:17 EDT 1999

Ok, as I've noted in some earlier posts, I have nasty tendency of
prototyping some protocols in Python for occassional implementation in more
'robust' 'commercially feasible' languages. I've ran at some oddity this
time, though:

Background:

The protocol involves encapsulating some data as 'packets' on TCP
connection somewhat like the Record layer of TLSv1.  The simple protocol in
running top of TCP; this, in itself, is nothing particularly new or
fancy. The _problem_, however, seems to be.

Problem:

On Linux, single connections have 'glass ceiling' of 50 roundtrips/second;
no matter what I kludge, it seems to stay there. Only way to fix this was
to send 'empty' packets after processed ones, thus increasing throughput by
about 5X.

On NT, the glass ceiling is same, just 1/10 of Linux's (5
roundtrips/second).

I _assume_ this is some TCP/Python-related feature; no other traffic than
the one mentioned occurs during the test period. Additionally, for pure
TCP, much higher throughput in small packets can be achieved:

P/TCP:snd+rcv                :    2110.1495/sec [1.39s] (473.90us/call)

However, with the protocol around it, the times suddenly die, literally:

Single echo(c:Up,exit:Normal)   :      23.6462/sec [4.06s] (42.290ms/call)

And the CPU spent is about 2%, thus the delays are .. somewhere. Any
insights on where? The Linux performance is sufficient, but NT performance
most definitely is _NOT_.

For reference, here's where it blocks in the Linux case:

+---------+-------------------+-------+--------------------------------------+
|Function |         Time spent|# Calls|                               Percent|
+---------+-------------------+-------+--------------------------------------+
|recv     |     79ms and 223us|    510|*                                     |
|send     |              354ms|    512|**                                    |
|select   |9s, 564ms and 750us|    530|***************95.08%***************  |
+---------+-------------------+-------+--------------------------------------+
+-------+--------------------------------------------------------------------+
|Threads|                                                            Coverage|
+-------+--------------------------------------------------------------------+
|1      |***************************90.11%****************************       |
+-------+--------------------------------------------------------------------+

Therefore it seems as if it just waits for the pending data in the select
about 90% of the time. That does not, by my definition, seem to be 'good'.

That reminds me, is there a tool for blockage coverage? CPU-use coverage is
rarely as interesting as 'where code spends it's time', to me, at any rate.

-Markus Stenberg

-- 
 Running Windows on a Pentium is like having a brand new Porsche but
 only be able to drive backwards with the handbrake on.
	(Unknown source)