[Python-bugs-list] [ python-Bugs-691291 ] codecs.open(filename, 'U', 'UTF-16') corrupts text

SourceForge.net noreply@sourceforge.net
Sat, 22 Feb 2003 11:21:01 -0800


Bugs item #691291, was opened at 2003-02-22 19:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=691291&group_id=5470

Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Orendorff (jorend)
Assigned to: M.-A. Lemburg (lemburg)
Summary: codecs.open(filename, 'U', 'UTF-16') corrupts text

Initial Comment:
Tested in Python 2.3a1.

If I write u'Hello\r\nworld\r\n' to a file, then read
it back in 'U' mode, I should get u'Hello\nworld\n'.

However, if I do this using codecs.open() and the
UTF-16 encoding, I get u'Hello\n\nworld\n\n'.

codecs.open() is not 'U'-mode-aware.  The underlying
file is opened in universal newline mode, so the byte
'\x0d' is erroneously translated to '\x0a' before the
UTF-16 codec has a chance to decode it.

The attached unit test should show specifically what it
is that I wish would work.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=691291&group_id=5470