[Patches] [ python-Patches-914340 ] gzip.GzipFile to accept stream as fileobj.

SourceForge.net noreply at sourceforge.net
Mon Jun 19 10:35:13 CEST 2006


Patches item #914340, was opened at 2004-03-11 19:45
Message generated for change (Comment added) made by antialize
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Igor Belyi (belyi)
Assigned to: Nobody/Anonymous (nobody)
Summary: gzip.GzipFile to accept stream as fileobj.

Initial Comment:
When gzip.GzipFile is initialized with a fileobj which
does not have
tell() and seek() methods (non-rewinding stream) it throws
exception. The interesting thing is that it doesn't
have to. The
following patch updates gzip.py to allow any stream
with just a
read() method to be used. This is helpful if you want
to be able to
do something like:
gzip.GzipFile(fileobj=urllib.urlopen("file:///README.gz")).readlines()
or use GzipFile with sys.stdin stream.

But keep in mind that seek() and rewind() methond of
the GzipFile()
won't for such stream even with the patch.

Igor


----------------------------------------------------------------------

Comment By: Jakob Truelsen (antialize)
Date: 2006-06-19 10:35

Message:
Logged In: YES 
user_id=379876

Is there any reson this patch is not accepted? If this patch
is accepted then I have a patch to urlib2 to (automaticaly)
accept gzipped content as described here
http://www.http-compression.com/#client_request, if there is
some reson this patch is not acceptable please detail, so it
can be fixed, in tired of using popen and gunzip :) 

----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-19 15:14

Message:
Logged In: YES 
user_id=995711

I thought I need to add a little bit more verbose
explanation for
the changes...

Current implementation of GzipFile() uses tell() and seek()
to scroll stream of data in the following 2 cases:
1. When EOF is reached and the last 8 bytes of the file
contain checksum and uncompress data size
2. When after decompression there's left some 'unused_data'
meaning that a stream may contains more than one compressed
item.

What my change does it introduces 2 helper buffers:
'inputbuf' which keeps read but unused data from the stream and
'last8' which keeps last 8 'used' bytes

Plus, my change introduces helper method _read_internal()
which is used instead of the direct call to
self.fileobj.read(). In this method data from the stream are
read as needed with the call to self.fileobj.read() and
correct values of 'inputbuf' and ''last8' are maintained.

When case 1 above happen we use 'last8' buffer to read
checksum and size.
When case 2 above happen we add value of the 'unused_data'
to inputbuf.

There's one more instance of the self.fileobj.seek() call
left in rewind() method but it is used only when rewind() or
seek() methods of GzipFile class are used. And it won't be
logical to expect those methods to work if the underlying
fileobj does not support them.

Igor


----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-19 05:27

Message:
Logged In: YES 
user_id=995711

I thought I need to add a little bit more verbose
explanation for
the changes...

Current implementation of GzipFile() uses tell() and seek()
to scroll stream of data in the following 2 cases:
1. When EOF is reached and the last 8 bytes of the file
contain checksum and uncompress data size
2. When after decompression there's left some 'unused_data'
meaning that a stream may contains more than one compressed
item.

What my change does it introduces 2 helper buffers:
'inputbuf' which keeps read but unused data from the stream and
'last8' which keeps last 8 'used' bytes

Plus, my change introduces helper method _read_internal()
which is used instead of the direct call to
self.fileobj.read(). In this method data from the stream are
read as needed with the call to self.fileobj.read() and
correct values of 'inputbuf' and ''last8' are maintained.

When case 1 above happen we use 'last8' buffer to read
checksum and size.
When case 2 above happen we add value of the 'unused_data'
to inputbuf.

There's one more instance of the self.fileobj.seek() call
left in rewind() method but it is used only when rewind() or
seek() methods of GzipFile class are used. And it won't be
logical to expect those methods to work if the underlying
fileobj does not support them.

Igor


----------------------------------------------------------------------

Comment By: Igor Belyi (belyi)
Date: 2004-03-11 21:04

Message:
Logged In: YES 
user_id=995711

Previous revision of the patch does not work correctly with
mutliple
compressed members in one stream. I've updated the patch file.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=914340&group_id=5470


More information about the Patches mailing list