Lightwight socket IO wrapper

Jorgen Grahn grahn+nntp at snipabacken.se
Mon Sep 21 07:25:21 EDT 2015


On Mon, 2015-09-21, Dennis Lee Bieber wrote:
> On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> <james.harris.1 at gmail.com> declaimed the following:

...
>>I thought UDP would deliver (or drop) a whole datagram but cannot find 
>>anything in the Python documentaiton to guarantee that. In fact 
>>documentation for the send() call says that apps are responsible for 
>>checking that all data has been sent. They may mean that to apply to 
>>stream protocols only but it doesn't state that. (Of course, UDP 
>>datagrams are limited in size so the call may validly indicate 
>>incomplete transmission even when the first part of a big message is 
>>sent successfully.)
>>
> 	Looking in the wrong documentation <G> 
>
> 	You probably should be looking at the UDP RFC. Or maybe just
>
> http://www.diffen.com/difference/TCP_vs_UDP
>
> """
> Packets are sent individually and are checked for integrity only if they
> arrive. Packets have definite boundaries which are honored upon receipt,
> meaning a read operation at the receiver socket will yield an entire
> message as it was originally sent.
> """
>
> 	Even if the IP layer has to fragment a UDP packet to meet limits of the
> transport media, it should put them back together on the other end before
> passing it up to the UDP layer. To my knowledge, UDP does not have a size
> limit on the message (well -- a 16-bit length field in the UDP header).

So they are "limited in size" like the OP wrote.  (A TCP stream OTOH is
potentially infinite.)

But also, the IPv4 RFC says:

    All hosts must be prepared to accept datagrams of up to 576 octets
    (whether they arrive whole or in fragments).  It is recommended
    that hosts only send datagrams larger than 576 octets if they have
    assurance that the destination is prepared to accept the larger
    datagrams.

As for "all or nothing" with UDP datagrams, you also have the socket
layer case where the user does read() into a 1000 octet buffer and the
datagram was 1200 octets.  With BSD sockets you can (if you try)
detect this, but the extra 200 octets are lost forever.

> But  since it /is/ "got it all" or "dropped" with no inherent confirmation, one
> would have to embed their own protocol within it -- sequence numbers with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol
> would mean having LARGE resends should packets be dropped or arrive out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).
>
> 	TCP is a stream protocol -- the protocol will ensure that all data
> arrives, and that it arrives in order, but does not enforce any boundaries
> on the data; what started as a relatively large packet at one end may
> arrive as lots of small packets due to intermediate transport limits (one
> can visualize a worst case: each TCP packet is broken up to fit Hollerith
> cards; 20bytes for header and 60 bytes of data -- then fed to a reader and
> sent on AS-IS).

The problem is IMO more this: the chunks of data that the application
writes doesn't map to what the other application reads.  In the lower
layers, I don't expect TCP segments to be split, and IP fragmentation
(if it happens at all) operates at an even lower level.

However the end result is still just as you write:

> Boundaries are the end-user responsibility... line endings
> (look at SMTP, where an email message ends on a line containing just a ".")
> or embedded length counter (not the TCP packet length).
>
>>Receiving no bytes is taken as indicating the end of the communication. 
>>That's OK for TCP but not for UDP so there should be a way to 
>>distinguish between the end of data and receiving an empty datagram.
>>
> 	I don't believe UDP supports a truly empty datagram (length of 0) --
> presuming a sending stack actually sends one, the receiving stack will
> probably drop it as there is no data to pass on to a client

UDP datagrams of length 0 work (just tried it on Linux).  There's
nothing special about it.

> (there is a PR
> at work because we have a UDP driver that doesn't drop 0-length messages,
> but also can't deliver them -- so the circular buffer might fill with
> undeliverable headers)

Those messages should be delivered to the receiving socket, in the
sense that they are sanity-checked, used to wake up the application
and mark the socket readable, fill up one entry in the read queue and
so on ...

Of course your system at work may have the rights to be more
restrictive, if it's special-purpose.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .



More information about the Python-list mailing list