Lightwight socket IO wrapper

James Harris james.harris.1 at gmail.com
Tue Sep 22 15:45:24 EDT 2015


"Dennis Lee Bieber" <wlfraed at ix.netcom.com> wrote in message 
news:mailman.12.1442794762.28679.python-list at python.org...
> On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> <james.harris.1 at gmail.com> declaimed the following:
>
>
>>
>>There are a few things and more crop up as time goes on. For example,
>>over TCP it would be helpful to have a function to receive a specific
>>number of bytes or one to read bytes until reaching a certain 
>>delimiter
>>such as newline or zero or space etc. Even better would be to be able 
>>to
>>use the iteration protocol so you could just code next() and get the
>>next such chunk of read in a for loop. When sending it would be good 
>>to
>>just say to send a bunch of bytes but know that you will get told how
>>many were sent (or didn't get sent) if it fails. Sock.sendall() 
>>doesn't
>>do that.
>
> Note that the "buffer size" option on a TCP socket.recv() gives you
> your "specific number of bytes" -- if available at that time.

"If" is a big word!

AIUI the buffer size is not guaranteed to relate to the number of bytes 
returned except that you won't/shouldn't(!) get more than the buffer 
size.

> I wouldn't want to user .recv(1) though to implement your "reaching a
> certain delimiter"... Much better to read as much as available and 
> search
> it for the delimiter.

Yes, that's what I do at the moment. I keep a block of bytes, add any 
new stuff to it and scan it for delimiters.

> I'll confess, adding a .readln() FOR TCP ONLY, might
> be a nice extension over BSD sockets (might need to allow option for
> whether line-ends are Internet standard <cr><lf> or some other marker, 
> and
> whether they should be converted upon reading to the native format for 
> the
> host).

Akira Li pointed out that there is just such an extension: makefile. 
Scanning to <lf> is what I do just now as that includes <cr><lf> too and 
I leave them on the string. IIRC file.readline works in the same way.

>>I thought UDP would deliver (or drop) a whole datagram but cannot find
>>anything in the Python documentaiton to guarantee that. In fact
>>documentation for the send() call says that apps are responsible for
>>checking that all data has been sent. They may mean that to apply to
>>stream protocols only but it doesn't state that. (Of course, UDP
>>datagrams are limited in size so the call may validly indicate
>>incomplete transmission even when the first part of a big message is
>>sent successfully.)
>>
> Looking in the wrong documentation <G>
>
> You probably should be looking at the UDP RFC. Or maybe just
>
> http://www.diffen.com/difference/TCP_vs_UDP
>
> """
> Packets are sent individually and are checked for integrity only if 
> they
> arrive. Packets have definite boundaries which are honored upon 
> receipt,
> meaning a read operation at the receiver socket will yield an entire
> message as it was originally sent.
> """

I would rather see it in the Python docs because we program to the 
language standard and there can be - and often are, for good reason - 
areas where Python does not work in the same way as underlying systems.

> Even if the IP layer has to fragment a UDP packet to meet limits of 
> the
> transport media, it should put them back together on the other end 
> before
> passing it up to the UDP layer. To my knowledge, UDP does not have a 
> size
> limit on the message (well -- a 16-bit length field in the UDP 
> header). But
> since it /is/ "got it all" or "dropped" with no inherent confirmation, 
> one
> would have to embed their own protocol within it -- sequence numbers 
> with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this 
> protocol
> would mean having LARGE resends should packets be dropped or arrive 
> out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).

Yes, it was the 16-bit limitation that I was talking about.

> TCP is a stream protocol -- the protocol will ensure that all data
> arrives, and that it arrives in order, but does not enforce any 
> boundaries
> on the data; what started as a relatively large packet at one end may
> arrive as lots of small packets due to intermediate transport limits 
> (one
> can visualize a worst case: each TCP packet is broken up to fit 
> Hollerith
> cards; 20bytes for header and 60 bytes of data -- then fed to a reader 
> and
> sent on AS-IS). Boundaries are the end-user responsibility... line 
> endings
> (look at SMTP, where an email message ends on a line containing just a 
> ".")
> or embedded length counter (not the TCP packet length).

Yes.

>>Receiving no bytes is taken as indicating the end of the 
>>communication.
>>That's OK for TCP but not for UDP so there should be a way to
>>distinguish between the end of data and receiving an empty datagram.
>>
> I don't believe UDP supports a truly empty datagram (length of 0) --
> presuming a sending stack actually sends one, the receiving stack will
> probably drop it as there is no data to pass on to a client (there is 
> a PR
> at work because we have a UDP driver that doesn't drop 0-length 
> messages,
> but also can't deliver them -- so the circular buffer might fill with
> undeliverable headers)

As others have pointed out, UDP implementations do seem to work with 
zero-byte datagrams properly. Again, I would rather see that in the 
Python documentation which is what, effectively, forms a contract that 
we should be able to rely on.

James




More information about the Python-list mailing list