zipped socket

Tue Aug 9 21:30:56 EDT 2005

jepler at unpythonic.net wrote:
 > As far as I know, there is not a prefabbed solution for this problem. 
  One
 > issue that you must solve is the issue of buffering (when must some 
data you've
 > written to the compressor really go out to the other side) and the 
issue of
 > what to do when a read() or recv() reads gzipped bytes but these 
don't produce any
 > additional unzipped bytes---this is a problem because normally a 
read() that
 > returns '' indicates end-of-file.
 >
 > If you only work with whole files at a time, then one easy thing to 
do is use
 > the 'zlib' encoding:
 >     >>> "abc".encode("zlib")
 >     "x\x9cKLJ\x06\x00\x02M\x01'"
 >     >>> _.decode("zlib")
 >     'abc'
 > ... but because zlib isn't self-delimiting, this won't work if you 
want to
 > write() multiple times, or if you want to read() less than the full file

That's basically a solved problem; zlib does have a kind of
self-delimiting. The key is the 'flush' method of the
compression object:

     some_send_function( compressor.flush(Z_SYNC_FLUSH) )

The Python module doc is unclear/wrong on this, but zlib.h
explains:

     If the parameter flush is set to Z_SYNC_FLUSH, all pending
     output is flushed to the output buffer and the output is
     aligned on a byte boundary, so that the decompressor can get
     all input data available so far.

There's also Z_FULL_FLUSH, which also re-sets the compression
dictionary. For a stream socket, we'd usually want to keep the
dictionary, since that's what gives us the compression. The
Python doc states:

     Z_SYNC_FLUSH and Z_FULL_FLUSH allow compressing further
     strings of data and are used to allow partial error recovery
     on decompression

That's not correct. Z_FULL_FLUSH allows recovery after errors,
but Z_SYNC_FLUSH is just to allow pushing all the compressor's
input to the decompressor's output.

-- 
--Bryan