Python 3.0 automatic decoding of UTF16

John Machin sjmachin at lexicon.net
Sun Dec 7 04:46:34 EST 2008


On Dec 7, 8:15 pm, Terry Reedy <tjre... at udel.edu> wrote:
> John Machin wrote:
> > Here's the scoop: It's a bug in the newline handling (in io.py, class
> > IncrementalNewlineDecoder, method decode). It reads text files in 128-
> > byte chunks. Converting CR LF to \n requires special case handling
> > when '\r' is detected at the end of the decoded chunk n in case
> > there's an LF at the start of chunk n+1. Buggy solution: prepend b'\r'
> > to the chunk n+1 bytes and decode that -- suddenly with a 2-bytes-per-
> > char encoding like UTF-16 we are 1 byte out of whack.

> Please post this on the tracker so it can get included with other io
> work for 3.0.1.

I'm fiddling with a short bug-demo script right now.



More information about the Python-list mailing list