[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Martin v. Löwis report at bugs.python.org
Tue Oct 8 14:19:05 CEST 2013


Martin v. Löwis added the comment:

Marc-Andre: please don't confuse "use in major operating systems" with "major use in operating systems".  I agree with Antoine that UTF-16 isn't widely used on Windows, despite notepad and Office supporting it. Most users on Windows using notepad continue to use the ANSI code page, most users of Word use Word files (instead of plain text).

Also, wchar_t on Windows isn't *really* UTF-16. Many APIs support lone surrogates just fine; they really are UCS-2 instead (e.g. the file system APIs). Only starting with Vista, MultiByteToWideChar will complain about lone surrogates.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________


More information about the Python-bugs-list mailing list