TCP packet size?

Wed Jun 14 04:08:14 EDT 2000

Standard approach in these cases is to use one of a (large number of
schemes, with large numbers of variations on each one) for encoding your
information.  Some simple examples:

    Data Length Encoding: length, data, length, data... --> an encoded
length value, followed by x bytes (or words, or what have you) of data.
Other end reads length, then reads until it has that length of information
(appending the results of reads until the total is equal to the desired
length, storing any remainder for later use if they are under length).  This
is most popular in binary formats and UDP protocols, where byte offsets seem
more natural and messages are discrete.  It is more efficient in most cases
than other systems.  It is somewhat fragile, in that if you lose the length
byte you can wind up foobared (since you can never again determine where
message boundaries are again in that stream).  In case of such failure, you
need to re-establish the stream/connection (expensive).  See any number of
formats from the C/C++ worlds, see also Python's pickle format (if I'm
remembering correctly).

    Markup: escaped data, escape character (or sequence), escaped data,
...--> read data until you get escape character (or set of characters, or
what have you), then process the message.  If the escape character is not
the last byte you received, then wait for the next bit of data and append,
repeat until you get a message, storing any remainder for the next
iteration.  This is more popular in human-readable data, and is what
XML-based protocols use.  See also Mime encoding (multipart).

    Control channel encoding: secondary channel gives indices into the
second as boundaries for messages.  Multi-media control protocols often use
this approach (they don't want to touch the multimedia stream, just let the
client know where to start/stop reading, etceteras).

HTH,
Mike

-----Original Message-----
From: chris [mailto:chris at rpgarchive.com]
Sent: Thursday, June 15, 2000 6:14 AM
To: Mike Fletcher; Python List
Subject: Re: TCP packet size?

I'll try a more specific explanation.  My application is pretty much a
multi-user application similar to Net Meeting or a chat room.  One major
piece of data that the users are sharing is a xml document (in memory not a
file).  One application is the host who accepts TCP connections from any
number of clients.   If a user changes the xml document, those changes are
sent to the host and then resent to each client.  Again, all connections are
TCP.   A problem arises when I send a large message (3000-4000 bytes).
Despite the fact that I give a large buffer size to socket.read() method, I
often receive that message dissembled.  Also, sometimes I'll find small
messages combined in the same read() call.  The messages just aren't
reassembled the way I thought TCP worked.  I'm hoping to find a way to
received one message at a time in its entirety, or determine the start and
end of my messages.    I'm not really using xml as a network protocol, its
just the data structure of the application.  I'd be happy to show you my
networking module if you like (you probably just cringed :) ) 
One way I tried to solve my problem was to break my messages into smaller
(1000 byte) messages that included a header.  However, even though I broke
the message up, it was not received in the same format.  Some of the message
were combined and broken at unpredictable positions. 
FYI, this might be silly to add, but I'm not creating a new TCP connection
for every message. I'm using the same connection repeatedly. 
Thanks for the help.  I'm kind of at my wits end. 
Mike Fletcher wrote: