tarfile.open(mode='w:gz'|'w|gz'|..., fileobj=StringIO()) fails.

sebastian.noack at googlemail.com sebastian.noack at googlemail.com
Mon May 26 16:44:28 EDT 2008


Hi,

is there a way to or at least a reason why I can not use tarfile to
create a gzip or bunzip2 compressed archive in the memory?

You might might wanna answer "use StringIO" but this isn't such easy
as it seems to be. ;) I am using Python 2.5.2, by the way. I think
this is a bug in at least in this version of python, but maybe
StringIO isn't just file-like enough for this "korky" tarfile module.
But this would conflict with its documentation.

"For special purposes, there is a second format for mode: 'filemode|
[compression]'. open() will return a TarFile object that processes its
data as a stream of blocks. No random seeking will be done on the
file. If given, fileobj may be any object that has a read() or write()
method (depending on the mode)."

Sounds good, but doesn't work. ;P StringIO provides a read() and
write() method amongst others. But tarfile has especially in this mode
problems with the StringIO object.

I extracted the code out of my project into a standalone python script
to proof this issue on the lowest level. You can run the script below
as following: ./StringIO-tarfile.py file1 [file2] [...]


#
# File: StringIO-tarfile.py
#
#!/usr/bin/env python

from StringIO import StringIO
import tarfile
import sys

def create_tar_file(filenames, fileobj, mode, result_cb=lambda f:
None):
	tar_file = tarfile.open(mode=mode, fileobj=fileobj)
	for f in filenames:
		tar_file.add(f)
	result = result_cb(fileobj)
	tar_file.close()
	return result

if __name__ == '__main__':
	files = sys.argv[1:]
	modes = ['w%s%s' % (x, y)for x in (':', '|') for y in ('', 'gz',
'bz2')]

	string_io_cb = lambda f: f.getvalue()

	for mode in modes:
		ext = mode.replace('w|', '-pipe.tar.').replace('w:',
'.tar.').rstrip('.')
		# StringIO test.
		content = create_tar_file(files, StringIO(), mode, string_io_cb)
		fd = open('StringIO%s' % ext, 'w')
		fd.write(content)
		fd.close()

		# file object test.
		fd = open('file%s' % ext, 'w')
		create_tar_file(files, fd, mode)


As test input, I have used a directory with a single text file. As you
can see below, any tests using plain file objects were successful. But
when using StringIO, I can only create uncompressed tar files. Even
though I don't get any errors when creating them most of the files are
just empty or truncated.


$ for f in `ls *.tar{,.gz,.bz2}`; do echo -n $f; du -h $f | awk
'{print " ("$1"B)"}'; tar -tf $f; echo; done

file-pipe.tar (84KB)
foo/
foo/ksp-fosdem2008.txt

file-pipe.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt

file-pipe.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt

file.tar (84KB)
foo/
foo/ksp-fosdem2008.txt

file.tar.bz2 (20KB)
foo/
foo/ksp-fosdem2008.txt

file.tar.gz (20KB)
foo/
foo/ksp-fosdem2008.txt

StringIO-pipe.tar (76KB)
foo/
foo/ksp-fosdem2008.txt
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

StringIO-pipe.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO-pipe.tar.gz (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO.tar (76KB)
foo/
foo/ksp-fosdem2008.txt

StringIO.tar.bz2 (0B)
tar: This does not look like a tar archive
tar: Error exit delayed from previous errors

StringIO.tar.gz (4.0KB)

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors


Can somebody reproduce this problem? Did I misunderstood the API? What
would be the best work around, if I am right? I am thinking about
using the gzip and bz2 module directly.

Regards
Sebastian Noack



More information about the Python-list mailing list