[Python-bugs-list] [Bug #124981] zlib decompress of sync-flushed data fails

noreply@sourceforge.net noreply@sourceforge.net
Tue, 12 Dec 2000 15:18:06 -0800

Bug #124981, was updated on 2000-Dec-07 23:25
Here is a current snapshot of the bug.

Project: Python
Category: Documentation
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Submitted by: abo
Assigned to : fdrake
Summary: zlib decompress of sync-flushed data fails

Details: I'm not sure if this is just an undocumented limitation or a genuine bug. I'm using python 1.5.2 on winNT.

A single decompress of a large amount (16K+) of compressed data that has been sync-flushed fails to produce all the data up to the sync-flush. The data remains inside the decompressor untill further compressed data or a final flush is issued. Note that the 'unused_data' attribute does not show that there is further data in the decompressor to process (it shows ''). 

A workaround is to decompress the data in smaller chunks. Note that compressing data in smaller chunks is not required, as the problem is in the decompressor, not the compressor.

The following code demonstrates the problem, and raises an exception when the compressed data reaches 17K;

from zlib import *
from random import *

# create compressor and decompressor

# try data sizes of 1-63K
for l in range(1,64):
    # generate random data stream
    for i in range(l*1024):
    # compress, sync-flush, and decompress
    # if decompressed data is different to input data, barf,
    if len(t) != len(a):
        print len(a),len(t),len(d.unused_data)
        raise error


Date: 2000-Dec-12 15:18
By: abo

Further comments...

After looking at the C code, a few things became clear; I need to read more about C/Python interfacing, and the "unused_data" attribute will only contain data if additional data is fed to a de-compressor at the end of a complete compressed stream.

The purpose of the "unused_data" attribute is not clear in the documentation, so that should probably be clarified (mind you, I am looking at pre-2.0 docs so maybe it already has?).

The failure to produce all data up to a sync-flush is something else... I'm still looking into it. I'm not sure if it is an inherent limitation of zlib, something that needs to be fixed in zlib, or something that needs to be fixed in the python interface. If it is an inherent limitation, I'd like to characterise it a bit better before documenting it. If it is something that needs to be fixed in either zlib or the python interface, I'd like to fix it.

Unfortunately, this is a bit beyond me at the moment, mainly in time, but also a bit in skill (need to read the python/C interfacing documentation). Maybe over the christmas holidays I'll get a chance to fix it.


Date: 2000-Dec-12 13:32
By: gvanrossum

OK, assigned to Fred. You may ask Andrew what to write. :-)

Date: 2000-Dec-08 14:50
By: abo

I'm not that sure I'm happy with it just being marked closed. AFAIKT, the implementation definitely doesn't do what the documentation says, so to save people like me time when they hit it, I'prefer the bug at least be assigned to documentation so that the limitation is documented.

>From my reading of the documentation as it stands, the fact that there is more pending data in the decompressor should be indicated by it's "unused_data" attribute. The tests seem to show that "decompress()" is only processing 16K of compressed data each call, which would suggest that "unused_data" should contain the rest. However, in all my tests that attribute has always been empty. Perhaps the bug is in there somewhere?

Another slight strangeness, even if "unused_data" did contain something, the only way to get it out is by feeding in more compressed data, or issuing a flush(), thus ending the decompression...

I guess that since I've been bitten by this, it's up to me to fix it. I've got the source to 2.0 and I'll have a look and see if I can submit a patch.

<sigh> and I was coding this app in python to avoid coding in C :-)

Date: 2000-Dec-08 09:26
By: akuchling

Python 2.0 demonstrates the problem, too.

I'm not sure what this is: a zlibmodule bug/oversight or
simply problems with zlib's API.  Looking at zlib.h, 
it implies that you'd have to call inflate() with the
flush parameter set to Z_SYNC_FLUSH to get the remaining data.  Unfortunately this doesn't seem to help -- .flush() method doesn't support an argument, but when I patch zlibmodule.c to allow one, .flush(Z_SYNC_FLUSH) always fails with a -5: buffer error, perhaps because it expects there to be some new data.

(The DEFAULTALLOC constant in zlibmodule.c is 16K, but this 
seems to be unrelated to the problem showing up with more than 16K of data, since changing DEFAULTALLOC to 32K or 1K makes no difference to the size of data at which the bug shows up.)

In short, I have no idea what's at fault, or if it can or should be fixed.  Unless you or someone else submits a patch, I'll just leave it alone, and mark this bug as closed and "Won't fix".


Date: 2000-Dec-08 07:44
By: gvanrossum

I *think* this may have been fixed in Python 2.0.

I'm assigning this to Andrew who can confirm that and close the bug report (if it is fixed).

Date: 2000-Dec-07 23:28
By: abo

Argh... SF killed all my indents... sorry about that. You should be able to figure it out, but if not email me and I can send a copy.

For detailed info, follow this link: