[Python-ideas] Gzip and zip extra field
Serhiy Storchaka
storchaka at gmail.com
Wed May 29 15:25:56 CEST 2013
Gzip files can contains an extra field [1] and some applications use
this for extending gzip format. The current GzipFile implementation
ignores this field on input and doesn't allow to create a new file with
an extra field.
ZIP file entries also can contains an extra field [2]. Currently it just
saved as bytes in the `extra` attribute of ZipInfo.
I propose to save an extra field for gzip file and provide structural
access to subfields.
f = gzip.GzipFile('somefile.gz', 'rb')
f.extra_bytes # A raw extra field as bytes
# iterating over all subfields
for xid, data in f.extra_map.items():
...
# get Apollo file type information
f.extra_map[b'AP'] # (or f.extra_map['AP']?)
# creating gzip file with extra field
f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes)
f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)])
f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata})
# change Apollo file type information
f.extra_map[b'AP'] = ...
Issue #17681 [3] has preliminary patches. There is some open doubt about
interface. Is not it over-engineered?
Currently GzipFile supports seamless reading a sequence of separately
compressed gzip files. Every such chunk can have own extra field (this
is used in dictzip for example). It would be desirable to be able to
read only until the end of current chunk in order not to miss an extra
field.
[1] http://www.gzip.org/format.txt
[2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
[3] http://bugs.python.org/issue17681
More information about the Python-ideas
mailing list