[Python-bugs-list] [ python-Bugs-691291 ] codecs.open(filename, 'U', 'UTF-16') corrupts text
SourceForge.net
noreply@sourceforge.net
Sat, 22 Feb 2003 11:21:01 -0800
Bugs item #691291, was opened at 2003-02-22 19:21
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=691291&group_id=5470
Category: Unicode
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Orendorff (jorend)
Assigned to: M.-A. Lemburg (lemburg)
Summary: codecs.open(filename, 'U', 'UTF-16') corrupts text
Initial Comment:
Tested in Python 2.3a1.
If I write u'Hello\r\nworld\r\n' to a file, then read
it back in 'U' mode, I should get u'Hello\nworld\n'.
However, if I do this using codecs.open() and the
UTF-16 encoding, I get u'Hello\n\nworld\n\n'.
codecs.open() is not 'U'-mode-aware. The underlying
file is opened in universal newline mode, so the byte
'\x0d' is erroneously translated to '\x0a' before the
UTF-16 codec has a chance to decode it.
The attached unit test should show specifically what it
is that I wish would work.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=691291&group_id=5470