How to process a very large (4Gb) tarfile from python?

Sat Jul 19 06:11:11 EDT 2008

On Thu, Jul 17, 2008 at 11:41:50PM -0700, Uwe Schmitt wrote:
> On 17 Jul., 22:21, Lars Gustäbel <l... at gustaebel.de> wrote:
> >
> > > Maybe we should post this issue to python-dev mailing list.
> > > Parsing large tar-files is not uncommon.
> >
> > This issue is known and was fixed for Python 3.0, seehttp://bugs.python.org/issue2058.
> 
> The proposed patch does not avoid caching the previous values of the
> iterator, it just reduces the size of each cached object.
> It would be nice to be able to avoid caching on demand, which would
> make iteration independent of the size of the tar file.

The size of the archive doesn't matter, it's the number of members. And I
wouldn't call it caching either. The members are stored in order to have a
table of contents and to allow random access. Also, the members list is
required for resolving hard links within the archive. It cannot be dropped
without side-effects.

-- 
Lars Gustäbel
lars at gustaebel.de

Those who would give up essential liberty, to purchase a little
temporary safety, deserve neither liberty nor safety.
(Benjamin Franklin)