[newbie] trying socket as a replacement for nc

Sun Dec 15 10:15:07 EST 2013

On 2013-12-15, Dan Stromberg <drsalists at gmail.com> wrote:
> On Fri, Dec 13, 2013 at 8:06 AM, Grant Edwards <invalid at invalid.invalid> wrote:
>> On 2013-12-12, Dan Stromberg <drsalists at gmail.com> wrote:

>>>> Just to be pedantic: _TCP_ sockets reserve that right.  UDP sockets
>>>> do not, and do in fact guarantee that each message is discrete.  [It
>>>> appears that the OP is undoubtedly using TCP sockets.]
>>>
>>> I haven't done a lot of UDP, but are you pretty sure UDP can't at
>>> least fragment large packets?  What's a router or switch to do if the
>>> Path MTU isn't large enough for an original packet?
>>>
>>> http://www.gamedev.net/topic/343577-fragmented-udp-packets/
>>
>> You're conflating IP datagrams and Ethernet packets.  The IP stack can
>> fragment an IP datagram into multiple Ethernet packets which are then
>> reassembled by the receiving IP stack into a single datagram before
>> being passed up to the next layer (in this case, UDP).
>
> As long as you're saying this of UDP, I have no problem with it.

That is indeed what I'm saying.  I apoligize if that was not clear in
my original posting.

> I've seen TCP fragment and not be reassembled though, which suggests
> to me that the reassembly's happening in UDP rather than IP.

That's something different.  In TCP, there's no guarantee that
reads/writes correspond 1:1 to IP datagrams.  TCP is a _stream_
protocol and there is no semantic meaning attached to the boundaries
between successive read/write calls the way there is with UDP.

> If it's done by IP the same way for UDP and TCP,

The IP layer is supposed to reassemble receive datagrams for both --
but that's got nothing to do with atomicity of TCP writes/reads.  The
TCP stack can (and often does) turn one write() call into multiple IP
datagrams.  It can also turn multiple writes into a singel IP
datagram.  On the other end, it can split up a single datagram into
multiple read()s and/or combined multiple datagrams into a single
read().  TCP is a stream service, not a datagram service like UDP.

> I'd not trust it in UDP either.

The standards all require UDP datagrams to be preserved.  All of the
UDP applications I've ever written or seen depend utterly on that, and
it's always worked that way for me.  If you've seen it fail, then you
ought to file a bug report.

>> Did you read the thread you pointed to?  Your question was answerd by
>> posting #4 in the thread you cited:
>>
>>    1) Yes, packets will be fragmented at the network layer (IP), but
>>       this is something you do not have to worry about since the
>>       network layer will reassemble the fragments before passing them
>>       back up to the transport layer (UDP). UDP garentees preserved
>>       message boundaries, so you never have to worry about only
>>       receiving a packet fragment :~).
>
> Actually, I believe the link I sent (which I skimmed) had people
> coming down on both sides of the matter.  Some said that UDP would be
> fine for small datagrams, while others said it would be fine,
> irrespective of size.

The maximum size of an IP datagram is 64KB, so it's not "fine
irrespecive of size".  If your UDP implementation is working correctly
it will be fine below that limit.

>> A few other references:
>>
>> http://tools.ietf.org/html/rfc791
>>
>>  1.1. Motivation
>>
>>   [...] The internet protocol provides for transmitting blocks of data
>>   called datagrams from sources to destinations, [...] The internet
>>   protocol also provides for fragmentation and reassembly of long
>>   datagrams, if necessary, for transmission through "small packet"
>>   networks.
>
> I've personally seen this fail to occur in TCP

You can't say that, because there's no correspondance between IP
datgrams and TCP read/write block sizes the way there is in UDP. 

With TCP there is nothing to fail (with respect to read/write block
sizes). TCP only guarantees that bytes will get there and get there in
the right order. It doesn't make any promises about block sizes.

> I've seen old time socket programmers explain that it cannot be relied
> upon in TCP; send() and recv() and (read() and write()) are system
> calls that return a length so that you can loop on them until all
> relevant data has been transferred.  They don't return that length
> just so you can ignore it.

That's true, but that's because of the design of the TCP _stream_
protocol, not because the IP datagram layer doesn't work right.
>
>>From the Socket HOWTO
>
> (http://docs.python.org/2/howto/sockets.html#socket-howto) : Now we
> come to the major stumbling block of sockets - send and recv operate
> on the network buffers. They do not necessarily handle all the bytes
> you hand them (or expect from them), because their major focus is
> handling the network buffers. In general, they return when the
> associated network buffers have been filled (send) or emptied (recv).
> They then tell you how many bytes they handled. It is your
> responsibility to call them again until your message has been
> completely dealt with.

If that's true for UDP, then the Python UDP implementation is broken,
and somebody should file a bug.  UDP is a a _datagram_ service.
Either all the bytes in a write() should get sent or none of them.
Sending a paritial datagram is _not_ a valid option.

-- 
Grant