[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later

STINNER Victor report at bugs.python.org
Fri Jun 17 01:42:34 CEST 2011


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

Patch version 5 fixes the encode/decode flags on Windows XP. The codecs give different result on XP and Seven in some cases:

Seven:

- b'\x81\x00abc'.decode('cp932', 'replace') returns '\u30fb\x00abc'
- '\udc80'.encode(CP_UTF8, 'strict') raises UnicodeEncodeError
- b'[\xed\xb2\x80]'.decode(CP_UTF8, 'strict') raises UnicodeEncodeError
- b'[\xed\xb2\x80]'.decode(CP_UTF8, 'ignore') returns '[]'
- b'[\xed\xb2\x80]'.decode(CP_UTF8, 'replace') returns '[\ufffd\ufffd\ufffd]'

XP:

- b'\x81\x00abc'.decode('cp932', 'replace') returns '\x00\x00abc'
- '\udc80'.encode(CP_UTF8, 'strict') returns b'\xed\xb2\x80'
- b'[\xed\xb2\x80]'.decode(CP_UTF8, 'strict') returns '[\udc80]'

These differences come from Windows codecs.

----------
Added file: http://bugs.python.org/file22389/mbcs5.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12281>
_______________________________________


More information about the Python-bugs-list mailing list