[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Tue Oct 8 14:19:05 CEST 2013

Martin v. Löwis added the comment:

Marc-Andre: please don't confuse "use in major operating systems" with "major use in operating systems".  I agree with Antoine that UTF-16 isn't widely used on Windows, despite notepad and Office supporting it. Most users on Windows using notepad continue to use the ANSI code page, most users of Word use Word files (instead of plain text).

Also, wchar_t on Windows isn't *really* UTF-16. Many APIs support lone surrogates just fine; they really are UCS-2 instead (e.g. the file system APIs). Only starting with Vista, MultiByteToWideChar will complain about lone surrogates.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________