Silly question about ungzipping a file

Mike C. Fletcher mcfletch at rogers.com
Thu Sep 26 17:21:39 EDT 2002


Try something like (untested):

chunk = 4096*1024 # 4MB chunks, play with this value to suit
myFile = gzip.open( 'blah.gz' ) # this is binary if I recall correctly
outFile = open( 'blah','wb') # note wb mode for windows machines!
data = myFile.read( chunk )
while data:
	outFile.write( data )
	data = myFile.read( chunk )
outFile.close()
myFile.close()

HTH,
Mike

Lemniscate wrote:
> Hi everyone,
> 
> This may be a ridiculously easy question, but I've kind of hit a wall
> and I was wondering if I am just missing something.  I want to
> automate the retrieval and unzipping of a *.gz file.  The issue is
> that the file, when it is unzipped, usually has a size somewhere in
> the range of 128MB.  Occassionally, I get memory errors when I try to
> run it.  Here is a quick idea of what I am doing...
> 
> 
>>>>import gzip
>>>>myFile = gzip.open('LL_tmpl.gz')
>>>>file('output.txt', 'w').write(myFile.read())
>>>>
>>>
> 
> Now, is there any way to do this so that less memory is used?  I mean,
> if I wanted to do some processing on the resulting output file, I
> would use xreadlines or something like that to keep memory consumption
> to a minimum.  Is there something roughly equivalent that I am not
> noticing in the gzip documentation.  Let me also say that I have tried
> the following as well:
> 
> 
>>>>myFile = gzip.open('LL_tmpl.gz')
>>>>fout = file('output.txt', 'w')
>>>>while myFile.readline():
>>>
> ... 	fout.write(myFile.readline())
> ... 	fout.write('\n')
> ... 	
> 
>>>>fout.close()
>>>>myFile = gzip.open('LL_tmpl.gz')
>>>>fout = file('output2.txt', 'w')
>>>>while myFile.readline():
>>>
> ... 	fout.writelines(myFile.readline())
> ... 	
> 
>>>>fout.close()
>>>
> 
> 
> These do solve my memory problem, but there are other issues.  First
> of all, my CPU gets pegged and it takes FOREVER (okay, not forever,
> but about ...let me test real quick... at least 4-5 times as long, and
> my computer is pretty much useless during that time (side note:  can
> you tell I am working on a woefully underpowered machine?)).  Is there
> something in-between that anybody can think of?  The other, and much
> more immediate, issue is puzzling to me.  It seems that the resulting
> files from the code are only about 65MB (64.7 to be exact) versus
> 129MB.  I'm sure I'm just missing something simple, but why is that? 
> Thanks for your time.
> 
> Lem

-- 
_______________________________________
   Mike C. Fletcher
   Designer, VR Plumber, Coder
   http://members.rogers.com/mcfletch/






More information about the Python-list mailing list