parse tar file with python

Eddie Corns eddie at holyrood.ed.ac.uk
Thu Jun 13 09:53:22 EDT 2002


"Shagshag13" <shagshag13 at yahoo.fr> writes:

>In fact, i don't want to tar/untar files, and especially not in main memory !

>I wish i could read the tar-ed file line by line (f.readline) and be able to check when i find the beginning of an "inside file" and
>get some info about it like name, how and so on... (that's because my original files are plain text file, and i think that tar will
>let them unchanged)

>In my tar file there is, for example, a kind of separator like this (but with everything in one long line) with :

>shag.py_0100744_0002033_0001750_00000004414_07500237361_0015314_0_ustar_00_shagshag_user_0000040_0000417_beginofmyfilehere

>where _ stand for a variable amount of another ascii code that i can't cut/paste...

>Do you kwow what each means (for example the first one is undoubtly the file name, but then...) ?
>And what are the fixed position of each of theses ?
>Have a clue ?

A google search on 'tar file format directory header size' came up with this:

  http://www.mkssoftware.com/docs/man4/tar.4.asp

I did write a tar program in Bliss10 many years (nay decades) ago and recall
it was quite easy to deal with.

Using tar is only useful if you know for certain that you're going to process
each file in exact sequence.  I think other solutions like making smaller
directories would be easier.

Surely if all you're doing is reading all files in sequence you can just
concatenate them into one?, if you need to know where each ends add a unique
seperator.

Eddie



More information about the Python-list mailing list