[Patches] [ python-Patches-914340 ] gzip.GzipFile to accept stream as fileobj.

SourceForge.net noreply at sourceforge.net
Thu Mar 15 16:16:47 CET 2007


Patches item #914340, was opened at 2004-03-11 19:45
Message generated for change (Comment added) made by lucas_malor
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.4
Status: Closed
Resolution: Out of Date
Priority: 5
Private: No
Submitted By: Igor Belyi (belyi)
Assigned to: Nobody/Anonymous (nobody)
Summary: gzip.GzipFile to accept stream as fileobj.

Initial Comment:
When gzip.GzipFile is initialized with a fileobj which
does not have
tell() and seek() methods (non-rewinding stream) it throws
exception. The interesting thing is that it doesn't
have to. The
following patch updates gzip.py to allow any stream
with just a
read() method to be used. This is helpful if you want
to be able to
do something like:
gzip.GzipFile(fileobj=urllib.urlopen("file:///README.gz")).readlines()
or use GzipFile with sys.stdin stream.

But keep in mind that seek() and rewind() methond of
the GzipFile()
won't for such stream even with the patch.

Igor


----------------------------------------------------------------------

Comment By: Lucas Malor (lucas_malor)
Date: 2007-03-15 16:16

Message:
Logged In: YES 
user_id=1403274
Originator: NO

There's a problem with this path. If previously in my code I read some
bytes of the the GzipFile object, _read_gzip_header returns IOError, 'Not a
gzipped file', because it starts to read at the current position, not at
the start. Unluckily seek() could not be used for urllib objects. I don't
see any possible workaround.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-03-08 21:59

Message:
Logged In: YES 
user_id=849994
Originator: NO

It looks like Patch #1675951 provides the same feature, plus speedups.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-03-06 15:51

Message:
Logged In: YES 
user_id=21627
Originator: NO

The patch in this form is incomplete: it lacks test suite changes. Can
somebody please provide patches to Lib/test/test_gzip.py that exercises
this new functionality?

----------------------------------------------------------------------

Comment By: Jakob Truelsen (antialize)
Date: 2006-06-19 10:35

Message:
Logged In: YES 
user_id=379876

Is there any reson this patch is not accepted? If this patch
is accepted then I have a patch to urlib2 to (automaticaly)
accept gzipped content as described here
http://www.http-compression.com/#client_request, if there is
some reson this patch is not acceptable please detail, so it
can be fixed, in tired of using popen and gunzip :) 

----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-19 15:14

Message:
Logged In: YES 
user_id=995711

I thought I need to add a little bit more verbose
explanation for
the changes...

Current implementation of GzipFile() uses tell() and seek()
to scroll stream of data in the following 2 cases:
1. When EOF is reached and the last 8 bytes of the file
contain checksum and uncompress data size
2. When after decompression there's left some 'unused_data'
meaning that a stream may contains more than one compressed
item.

What my change does it introduces 2 helper buffers:
'inputbuf' which keeps read but unused data from the stream and
'last8' which keeps last 8 'used' bytes

Plus, my change introduces helper method _read_internal()
which is used instead of the direct call to
self.fileobj.read(). In this method data from the stream are
read as needed with the call to self.fileobj.read() and
correct values of 'inputbuf' and ''last8' are maintained.

When case 1 above happen we use 'last8' buffer to read
checksum and size.
When case 2 above happen we add value of the 'unused_data'
to inputbuf.

There's one more instance of the self.fileobj.seek() call
left in rewind() method but it is used only when rewind() or
seek() methods of GzipFile class are used. And it won't be
logical to expect those methods to work if the underlying
fileobj does not support them.

Igor


----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-19 05:27

Message:
Logged In: YES 
user_id=995711

I thought I need to add a little bit more verbose
explanation for
the changes...

Current implementation of GzipFile() uses tell() and seek()
to scroll stream of data in the following 2 cases:
1. When EOF is reached and the last 8 bytes of the file
contain checksum and uncompress data size
2. When after decompression there's left some 'unused_data'
meaning that a stream may contains more than one compressed
item.

What my change does it introduces 2 helper buffers:
'inputbuf' which keeps read but unused data from the stream and
'last8' which keeps last 8 'used' bytes

Plus, my change introduces helper method _read_internal()
which is used instead of the direct call to
self.fileobj.read(). In this method data from the stream are
read as needed with the call to self.fileobj.read() and
correct values of 'inputbuf' and ''last8' are maintained.

When case 1 above happen we use 'last8' buffer to read
checksum and size.
When case 2 above happen we add value of the 'unused_data'
to inputbuf.

There's one more instance of the self.fileobj.seek() call
left in rewind() method but it is used only when rewind() or
seek() methods of GzipFile class are used. And it won't be
logical to expect those methods to work if the underlying
fileobj does not support them.

Igor


----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-11 21:04

Message:
Logged In: YES 
user_id=995711

Previous revision of the patch does not work correctly with
mutliple
compressed members in one stream. I've updated the patch file.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470


More information about the Patches mailing list