[Patches] [ python-Patches-618135 ] gzip.py and files > 2G

noreply@sourceforge.net noreply@sourceforge.net
Tue, 05 Nov 2002 12:40:41 -0800


Patches item #618135, was opened at 2002-10-03 12:16
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Tim Peters (tim_one)
Summary: gzip.py and files > 2G

Initial Comment:
Problem:

Currently, the gzip module is not able to work with files 
> 2G uncompressed. The source of the problem is that 
at the end of a .gz file, there is a trailer containing a 32  
bit length field. This field is of course unable to represent 
a file length > 4G. Because of mixed type arithmetic in 
gzip.py, this limit is lowered to 2G.

Testcase:

python gzip.py <file> # must be > 2G
python gzip.py -d <file.gz> # error

Proposed fix:

Test the uncompressed data size modulo 4G. A patch 
implementing this fix is attached. This is also the 
solution that gzip itself uses.

Two other remarks:

I don't understand lines 22-23 of gzip.py: why is the 
test: "if value < 0" necessary when writing an unsigned 
int?

The testing of the crc value in GzipFile._read_eof() is 
done modulo 4G. Is this necessary? crc32 is just read 
from the file as a normal int, and self.crc is from zlib.crc 
which always returns a regular int.

Regards,
Geert Jansen

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-11-05 15:40

Message:
Logged In: YES 
user_id=31435

Got it.  It's distasteful but pragmatic <wink>.  Fixed again, in

Lib/gzip.py; new revision: 1.37
Misc/NEWS; new revision: 1.510

It was tested "by hand" on Win2K (on a 6+GB file).

----------------------------------------------------------------------

Comment By: Geert Jansen (geertj)
Date: 2002-11-05 05:36

Message:
Logged In: YES 
user_id=537938

I'm afraid this doesn't fix the whole problem.

You fixed the problem for file sizes in the range 2G-4G, but (if 
I read your patch correctly), files >4G still don't work. On 
Linux it is very easy to create files > 4G and Python supports 
this, so it would be nice to have.

A better fix IMHO would be to test the file size modulo 4G.  
The probability that an invalid gzip files becomes valid by this 
less accurate test is astronomically small (there is also a 
CRC). In fact, this is also the fix that the "official" gzip 
program uses.

I can give you a test account on my Linux machine if you 
want to test a patch and don't have a machine with large file 
support nearby . Or I can test a patch for you.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-11-04 14:51

Message:
Logged In: YES 
user_id=31435

Fixed, by related changes in

Lib/gzip.py; new revision: 1.36
Misc/NEWS; new revision: 1.508

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-11-04 12:08

Message:
Logged In: YES 
user_id=31435

Assigned to me.  I think your suggested fix makes good 
sense.

----------------------------------------------------------------------

Comment By: Geert Jansen (geertj)
Date: 2002-10-04 03:36

Message:
Logged In: YES 
user_id=537938

Sorry -- it seems the file upload went wrong! Second try.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470