How to receive a data file of unknown length using a python socket?

Hendrik van Rooyen mail at microcorp.co.za
Sun Jul 19 05:47:07 EDT 2009


On Sunday 19 July 2009 02:12:32 John Machin wrote:
>
> Apologies in advance for my ignorance -- the last time I dipped my toe
> in that kind of water, protocols like zmodem and Kermit were all the
> rage -- but I would have thought there would have been an off-the-
> shelf library for peer-to-peer file transfer over a socket
> interface ... not so?

*Grins at the references to Kermit and zmodem, 
and remembers Laplink and PC Anywhere*

If there is such a transfer beast in Python, I have 
not found it.
(There is an FTP module but that is not quite
the same thing)

I think it is because the network stuff is
all done in the OS or NFS and SAMBA 
now - with drag and drop support and 
other nice goodies.

I have ended up writing a netstring thingy,
that addresses the string transfer problem
by having a start sentinel, a four byte ASCII
length (so you can see it with a packet 
sniffer/displayer) and the rest of the
data escaped to take out the start
sentinel and the escape character.

It works, but the four byte ASCII limits the size
of what can be sent and received.

It guarantees to deliver either the whole
string, or fail, or timeout.

If anybody is interested I will attach the 
code here. It is not a big module.

This question seems to come up periodically
in different guises.

To the OP:

There are really very few valid ways of
solving the string transfer problem,
given a featureless stream of bytes
like a socket.

The first thing that must be addressed
is to sync up - you have to somehow
find the start of the thing as it comes 
past.

And the second is to find the end of the 
slug of data that you are transferring.

So the simplest way is to designate a byte 
as a start and end sentinel, and to make 
sure that such a byte does not occur in 
the data stream, other than as a start
and end marker.  This process is called
escaping, and the reverse is called
unescaping. (SDLC/HDLC does this at a bit 
pattern level)

Another way is to use time, namely to
rely on there being some minimum
time between slugs of data.  This 
does not work well on TCP/IP sockets,
as retries at the lower protocol levels
can give you false breaks in the stream.
It works well on direct connections like
RS-232 or RS-485/422 lines.

Classic netstrings send length, then data.
They rely on the lower level protocols and
the length sent for demarcation of
the slug, and work well if you connect,
send a slug or two, and disconnect.  They 
are not so hot for long running processes, 
where processors can drop out while
sending - there is no reliable way for a 
stable receiver to sync up again if it is
waiting for a slug that will not finish.

Adapting the netstring by adding a sync
character and time out is a compromise 
that I have found works well in practice.

- Hendrik




More information about the Python-list mailing list