[issue33928] _Py_DecodeUTF8Ex() creates surrogate pairs on Windows

STINNER Victor report at bugs.python.org
Thu Jun 21 06:57:13 EDT 2018


STINNER Victor <vstinner at redhat.com> added the comment:

Extract of _Py_DecodeUTF8Ex() code, there is an explicit "write a surrogate pair" comment:

#if SIZEOF_WCHAR_T == 4
        ch = ucs4lib_utf8_decode(&s, e, (Py_UCS4 *)unicode, &outpos);
#else
        ch = ucs2lib_utf8_decode(&s, e, (Py_UCS2 *)unicode, &outpos);
#endif
        if (ch > 0xFF) {
#if SIZEOF_WCHAR_T == 4
            Py_UNREACHABLE();
#else
            assert(ch > 0xFFFF && ch <= MAX_UNICODE);
            /* write a surrogate pair */
            unicode[outpos++] = (wchar_t)Py_UNICODE_HIGH_SURROGATE(ch);
            unicode[outpos++] = (wchar_t)Py_UNICODE_LOW_SURROGATE(ch);
#endif
        }

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33928>
_______________________________________


More information about the Python-bugs-list mailing list