[Patches] [ python-Patches-618135 ] gzip.py and files > 2G
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 05 Nov 2002 12:40:41 -0800
Patches item #618135, was opened at 2002-10-03 12:16
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470
Category: Library (Lib)
Group: Python 2.3
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Geert Jansen (geertj)
Assigned to: Tim Peters (tim_one)
Summary: gzip.py and files > 2G
Initial Comment:
Problem:
Currently, the gzip module is not able to work with files
> 2G uncompressed. The source of the problem is that
at the end of a .gz file, there is a trailer containing a 32
bit length field. This field is of course unable to represent
a file length > 4G. Because of mixed type arithmetic in
gzip.py, this limit is lowered to 2G.
Testcase:
python gzip.py <file> # must be > 2G
python gzip.py -d <file.gz> # error
Proposed fix:
Test the uncompressed data size modulo 4G. A patch
implementing this fix is attached. This is also the
solution that gzip itself uses.
Two other remarks:
I don't understand lines 22-23 of gzip.py: why is the
test: "if value < 0" necessary when writing an unsigned
int?
The testing of the crc value in GzipFile._read_eof() is
done modulo 4G. Is this necessary? crc32 is just read
from the file as a normal int, and self.crc is from zlib.crc
which always returns a regular int.
Regards,
Geert Jansen
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2002-11-05 15:40
Message:
Logged In: YES
user_id=31435
Got it. It's distasteful but pragmatic <wink>. Fixed again, in
Lib/gzip.py; new revision: 1.37
Misc/NEWS; new revision: 1.510
It was tested "by hand" on Win2K (on a 6+GB file).
----------------------------------------------------------------------
Comment By: Geert Jansen (geertj)
Date: 2002-11-05 05:36
Message:
Logged In: YES
user_id=537938
I'm afraid this doesn't fix the whole problem.
You fixed the problem for file sizes in the range 2G-4G, but (if
I read your patch correctly), files >4G still don't work. On
Linux it is very easy to create files > 4G and Python supports
this, so it would be nice to have.
A better fix IMHO would be to test the file size modulo 4G.
The probability that an invalid gzip files becomes valid by this
less accurate test is astronomically small (there is also a
CRC). In fact, this is also the fix that the "official" gzip
program uses.
I can give you a test account on my Linux machine if you
want to test a patch and don't have a machine with large file
support nearby . Or I can test a patch for you.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-11-04 14:51
Message:
Logged In: YES
user_id=31435
Fixed, by related changes in
Lib/gzip.py; new revision: 1.36
Misc/NEWS; new revision: 1.508
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-11-04 12:08
Message:
Logged In: YES
user_id=31435
Assigned to me. I think your suggested fix makes good
sense.
----------------------------------------------------------------------
Comment By: Geert Jansen (geertj)
Date: 2002-10-04 03:36
Message:
Logged In: YES
user_id=537938
Sorry -- it seems the file upload went wrong! Second try.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=618135&group_id=5470