How to process a very large (4Gb) tarfile from python?

Uwe Schmitt rocksportrocker at googlemail.com
Thu Jul 17 13:39:23 EDT 2008


On 17 Jul., 17:55, Terry Carroll <carr... at nospam-tjc.com> wrote:
> On Thu, 17 Jul 2008 06:14:45 -0700 (PDT), Uwe Schmitt
>
> <rocksportroc... at googlemail.com> wrote:
> >I had a look at tarfile.py in my current Python 2.5 installations
> >lib path. The iterator caches TarInfo objects in a list
> >tf.members . If you only want to iterate and you  are not interested
> >in more functionallity, you could use "tf.members=[]" inside
> >your loop. This is a dirty hack !
>
> Thanks, Uwe.  That works fine for me.  It now reads through all 2.5
> million members, in about 30 minutes, never going above a 4M working
> set.

Maybe we should post this issue to python-dev mailing list.
Parsing large tar-files is not uncommon.

Greetings, Uwe



More information about the Python-list mailing list