[Tutor] Reading large bz2 Files

Lie Ryan lie.1296 at gmail.com
Fri Feb 19 22:14:31 CET 2010


On 02/20/10 07:42, Lie Ryan wrote:
> On 02/19/10 23:42, Norman Rieß wrote:
>> Hello,
>>
>> i am trying to read a large bz2 file with this code:
>>
>> source_file = bz2.BZ2File(file, "r")
>> for line in source_file:
>>     print line.strip()
>>
>> But after 4311 lines, it stoppes without a errormessage. The bz2 file is
>> much bigger though.
>> How can i read the whole file line by line?
> 
> Is the bz2 file an archive[1]?
> 
> [1] archive: contains more than one file

Or more clearly, is the bz2 contains multiple file compressed using -c
flag? The -c flag will do a simple concatenation of multiple compressed
streams to stdout; it is only decompressible with bzip2 0.9.0 or later[1].

You cannot use bz2.BZ2File to open this, instead use the stream
decompressor bz2.BZ2Decompressor.

A better approach, is to use a real archiving format (e.g. tar).

[1] http://www.bzip.org/1.0.3/html/description.html



More information about the Tutor mailing list