Tar-like module?

Robert Amesz sheershion at mailexpire.com
Fri Dec 21 09:09:29 EST 2001


Scott Fenton wrote:

> This may be an idiotic question, but is there a module
> to read tar files in python? I googled around and 
> couldn't find anything, and the Python Std Library docs
> didn't have anything. Any help would be nice.

Wel, I've written such a module for my own use a few months ago, in 
pure Python. It can both read and write TAR-files, or streams, rather. 
I've used it to re-archive about a GB of data[1] without any 
hiccoughs[2], so any bugs will most likely be minor ones. (Of course, 
standard disclaimers ALWAYS apply.)

If you want it you'd be welcome to it, but as I never expected to be 
used by anyone but myself there are a few caveats:

1 - There's no documentation, not even docstrings , so I really should 
whip up some very basic documentation first. Fortunately, there is a 
module, which I wrote for initial testing which at least has an example 
how to write .tgz files and read straight .tar files.

2 - It has been tested under Win98 _only_. There really isn't any 
platform dependent code in it as far as I know, but, well, untested is 
untrusted.

3 - Although I've used the module to read an write a lot of files my 
prime concern has been the integrity of the data, things like file 
modes and file ownership were not important to me and and therefore 
should not be trusted in that respect.

4 - The module has been conceived so data flows into it and out of it 
as streams, using only read() / write(). (This is to make it easier use 
(de)compression efficiently.) This does imply, however, that you can't 
get a list of files from the archive without reading the entire 
archive, and if you need just a single file you must read all files 
preceding it. For compressed TAR files this is not an issue, as the 
(de)compressor can't start in the middle of a stream anyway, but for 
other applications this might be very inefficient. Also, when writing, 
the size of each file must be known in advance.


Robert Amesz
-- 
[1] Making .zip archives into .tbz2 (tar.bz2) archives can free up a 
lot of space on your HD.

[2] Every TAR file which was written by the write routines was verified 
by using the read routines and comparing SHA checksums with the 
original files. Also, I've used some standard archivers to do some spot 
checks, and those were happy with the re-archived files, too.



More information about the Python-list mailing list