[Tutor] gzip

Stefan Behnel stefan_ml at behnel.de
Mon Aug 8 10:34:15 CEST 2011


questions anon, 08.08.2011 01:57:
> Thank you, I didn't realise that was all I needed.
> Moving on to the next problem:
> I would like to loop through a number of directories and decompress each
> *.gz file and leave them in the same folder but the code I have written only
> seems to focus on the last folder. Not sure where I have gone wrong.
> Any feedback will be greatly appreciated.
>
>
> import gzip
> import os
>
> MainFolder=r"D:/DSE_work/temp_samples/"
>
> for (path, dirs, files) in os.walk(MainFolder):
>      for dir in dirs:
>          outputfolder=os.path.join(path,dir)
>          print "the path and dirs are:", outputfolder
>      for gzfiles in files:
>          print gzfiles
>          if gzfiles[-3:]=='.gz':
>              print 'dealing with gzfiles:', dir, gzfiles
>              f_in=os.path.join(outputfolder,gzfiles)
>              print f_in
>              compresseddata=gzip.GzipFile(f_in, "rb")
>              newFile=compresseddata.read()
>              f_out=open(f_in[:-3], "wb")
>              f_out.write(newFile)

Note how "outputfolder" is set and reset in the first inner loop, *before* 
starting the second inner loop. Instead, build the output directory name 
once, without looping over the directories (which, as far as I understand 
your intention, you can ignore completely).

Also, see the shutils module. It has a method that efficiently copies data 
between open file(-like) objects. With that, you can avoid reading the 
whole file into memory.

Stefan



More information about the Tutor mailing list