[issue1597011] Reading with bz2.BZ2File() returns one garbage character

Sean Reifschneider report at bugs.python.org
Tue Aug 28 12:26:08 CEST 2007


Sean Reifschneider added the comment:

There are some bugs in the bz2 module.  The problem boils down to the
following code, notice how *c is assigned *BEFORE* the check to see if
there was a read error:

   do {
      BZ2_bzRead(&bzerror, f->fp, &c, 1);
      f->pos++;
      *buf++ = c;
   } while (bzerror == BZ_OK && c != '\n' && buf != end);

This could be fixed by putting a "if (bzerror == BZ_OK) break;" after
the BZ2_bzRead() call.

However, I also noticed that in the universal newline section of the
code it is reading a character, incrementing f->pos, *THEN* checking if
buf == end and if so is throwing away the character.

I changed the code around so that the read loop is unified between
universal newlines and regular newlines.  I guess this is a small
performance penalty, since it's checking the newline mode for each
character read, however we're already doing a system call for every
character so one additional comparison and jump to merge duplicate code
for maintenance reasons is probably a good plan.  Especially since the
reason for this bug only existed in one of the two duplicated parts of
the code.

Please let me know if this looks good to commit.

----------
nosy: +jafo

_____________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1597011>
_____________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python-trunk-bz2.patch
Type: text/x-patch
Size: 1694 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-bugs-list/attachments/20070828/72e7a164/attachment-0001.bin 


More information about the Python-bugs-list mailing list