[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

Sat Jan 17 16:19:15 CET 2009

Marc-Andre Lemburg <mal at egenix.com> added the comment:

On 2009-01-17 14:00, STINNER Victor wrote:
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> 
>> Looks pretty good at first glance, except that it seems that the UTF-32 to
>> UTF-16 translation is skipped if HAVE_USABLE_WCHAR_T is defined.  Is that
>> deliberate?
> 
> #ifdef HAVE_USABLE_WCHAR_T
>     memcpy(unicode->str, w, size * sizeof(wchar_t));
> #else
>     ...
> #endif
> 
> I understand this code as: sizeof(wchar_t) == sizeof(Py_UNICODE). If I 
> misunderstood the code, it's a a heap overflow :-) So there is no not 
> conversion from UTF-32 to UTF-16 using memcpy if HAVE_USABLE_WCHAR_T is 
> defined, right?

If HAVE_USABLE_WCHAR_T is defined, Py_UNICODE is defined as wchar_t,
so a memcpy can be used. Note that this does not provide any information
about sizeof(wchar_t), e.g. with GLIBC, wchar_t is 4 bytes. MS C lib defines
it as 2 bytes.

That said, if Py_UNICODE is the same as wchar_t, no conversion is
necessary and that's why the function simply copies over the data.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4474>
_______________________________________