Read a gzip file from inside a tar file

Fredrik Lundh fredrik at pythonware.com
Mon Dec 13 16:07:30 EST 2004


Craig Ringer wrote:

>> These are huge files. My goal is to analyze the content of the gzip
>> file in the tar file without having to un gzip.  If that is possible.
>
> As far as I know, gzip is a stream compression algorithm that can't be
> decompressed in small blocks. That is, I don't think you can seek 500k
> into a 1MB file and decompress the next 100k.

correct.

> I'd say you'll have to progressively read the file from the beginning,
> processing and discarding as you go. It looks like a no-brainer to me -
> see zlib.decompressobj.

it can be a bit tricky to set things up properly, though.  here's a piece
of code that uses Python's good old consumer interface to decode things
incrementally:

    http://effbot.org/zone/consumer-gzip.htm

you can either use this as is; just create a "target consumer", wrap it in the
gzip consumer, and feed data to the gzip consumer in suitable pieces.

alternatively, hack it until it does what you want.

</F> 






More information about the Python-list mailing list