[Python-ideas] Gzip and zip extra field

Serhiy Storchaka storchaka at gmail.com
Wed May 29 15:25:56 CEST 2013


Gzip files can contains an extra field [1] and some applications use 
this for extending gzip format. The current GzipFile implementation 
ignores this field on input and doesn't allow to create a new file with 
an extra field.

ZIP file entries also can contains an extra field [2]. Currently it just 
saved as bytes in the `extra` attribute of ZipInfo.

I propose to save an extra field for gzip file and provide structural 
access to subfields.

f = gzip.GzipFile('somefile.gz', 'rb')
f.extra_bytes # A raw extra field as bytes
# iterating over all subfields
for xid, data in f.extra_map.items():
     ...
# get Apollo file type information
f.extra_map[b'AP'] # (or f.extra_map['AP']?)
# creating gzip file with extra field
f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes)
f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)])
f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata})
# change Apollo file type information
f.extra_map[b'AP'] = ...

Issue #17681 [3] has preliminary patches. There is some open doubt about 
interface. Is not it over-engineered?

Currently GzipFile supports seamless reading a sequence of separately 
compressed gzip files. Every such chunk can have own extra field (this 
is used in dictzip for example). It would be desirable to be able to 
read only until the end of current chunk in order not to miss an extra 
field.

[1] http://www.gzip.org/format.txt
[2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
[3] http://bugs.python.org/issue17681




More information about the Python-ideas mailing list