[Tutor] Reading large bz2 Files

Steven D'Aprano steve at pearwood.info
Fri Feb 19 17:04:31 CET 2010


On Fri, 19 Feb 2010 11:42:07 pm Norman Rieß wrote:
> Hello,
>
> i am trying to read a large bz2 file with this code:
>
> source_file = bz2.BZ2File(file, "r")
> for line in source_file:
>      print line.strip()
>
> But after 4311 lines, it stoppes without a errormessage. The bz2 file
> is much bigger though.
>
> How can i read the whole file line by line?

"for line in file" works for me:


>>> import bz2
>>>
>>> writer = bz2.BZ2File('file.bz2', 'w')
>>> for i in xrange(20000):
...     # write some variable text to a line
...     writer.write('abc'*(i % 5) + '\n')
...
>>> writer.close()
>>> reader = bz2.BZ2File('file.bz2', 'r')
>>> i = 0
>>> for line in reader:
...     i += 1
...
>>> reader.close()
>>> i
20000


My guess is one of two things:

(1) You are mistaken that the file is bigger than 4311 lines.

(2) You are using Windows, and somehow there is a Ctrl-Z (0x26) 
character in the file, which Windows interprets as End Of File when 
reading files in text mode. Try changing the mode to "rb" and see if 
the behaviour goes away.




-- 
Steven D'Aprano


More information about the Tutor mailing list