[issue12010] Compile fails when sizeof(wchar_t) == 1

Fri May 6 22:31:05 CEST 2011

Marc-Andre Lemburg <mal at egenix.com> added the comment:

David Coles wrote:
> 
> David Coles <coles.david at gmail.com> added the comment:
> 
> After doing some more investigation it appears that Android's wchar_t support before android-9 is totally broken (see http://android.git.kernel.org/?p=platform/ndk.git;a=blob_plain;f=docs/STANDALONE-TOOLCHAIN.html;hb=HEAD). With android-9 you get 4 byte wchar_t and working wide character functions.
>
> Possibly of more interest for Python is that it's no longer buildable without wchar_t support. While unicodeobject is pretty good at checking HAVE_WCHAR_H, a number of modules and even pythonrun.c directly use wchar_t or functions like PyUnicode_FromWideChar without providing a fallback. Does Python 3 now require wchar_t or are these bugs? (either option seems sensible).

wchar_t should be fairly portable these days. I think the main
problem is that we never assumed sizeof(wchar_t) == 1 to be a
possibility. On Windows, wchar_t was 16 bit and the glibc started
out with 32 bits.

> A few other notes:
> HAVE_USABLE_WCHAR_T looks like it was a check for unsigned/>16 bits wchar_t that would allow them to be directly memcpy'd. The code in unicodeobject.c seems not to really use this anymore except (it's happy with signed or unsigned) and it looks like the check is only used for Windows now.

Note that HAVE_USABLE_WCHAR_T is only used to check whether
Python can use wchar_t as alias for Py_UNICODE. Python's Unicode
implementation needs Py_UNICODE to be an unsigned type with
either 2 bytes or 4 bytes. If wchar_t does not provide these
sizes or is a signed type, Python cannot use it for Py_UNICODE
and must instead use "unsigned short".

If the configure script does not detect this case, then a patch
would be helpful.

The other wchar_t C lib functions should still remain usable,
though.

> To properly support wchar_t of size 1 you would basically implement multibyte character storage either with UTF-8 or just packing two wchar_t's with UTF-16. At least in Android the distinction doesn't seem to matter as Android's internationalziation/localization policy seems to be "use Java".

Python should not use wchar_t for Py_UNICODE on such platforms
and instead go with "unsigned short".

I would assume that the wchar_t C lib routines work based on UTF-8
with sizeof(wchar_t) == 1, so the PyUnicode_*WideChar*() APIs would
need to be adjusted to work more or less like the UTF-8 codecs.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12010>
_______________________________________