compression level with tarfile (w:gz) ?

Lars Gustäbel lars at gustaebel.de
Mon Aug 10 09:41:08 EDT 2009


On Mon, Aug 10, 2009 at 08:50:21AM -0400, Esmail wrote:
> I was wondering if it possible to specify a compression level when I
> tar/gzip a file in Python using the tarfile module. I would like to
> specify the highest (9) compression level for gzip.

tarfile uses gzip.GzipFile() internally, GzipFile()'s default compression level
is 9.

> When I create a simple tar and then gzip it 'manually' with compression
> level 9, I get a smaller archive than when I have this code execute with
> the w:gz option.

How much smaller is it? I did a test with a recent Linux kernel source tree
which made an archive of 337MB. Command-line gzip was ahead of Python's
GzipFile() by just 20200 bytes(!) with an archive of about 74MB.

> Is the only way to accomplish the higher rate to create a tar file
> and then use a different module to gzip it (assuming I can specify
> the compression level there)?

If you need the disk space that badly, the alternative would be to pipe
tarfile's output to command-line gzip somehow:

fobj = open("result.tar.gz", "w")
proc = subprocess.Popen(["gzip", "-9"], stdin=subprocess.PIPE, stdout=fobj)
tar = tarfile.open(fileobj=proc.stdin, mode="w|")
tar.add(...)
tar.close()
proc.stdin.close()
fobj.close()

Cheers,

-- 
Lars Gustäbel
lars at gustaebel.de

A physicist is an atom's way of knowing about atoms.
(George Wald)



More information about the Python-list mailing list