Unicode problem in ucs4

"Martin v. Löwis" martin at v.loewis.de
Mon Mar 23 19:55:32 EDT 2009


> So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3
> \0s after a char, printf or wprintf is only printing one letter.

No. printf indeed will see a terminating character. However, wprintf
should correctly know that a wchar_t has four bytes per character,
and print it correctly. Make sure to use %ls to print wchar_t arrays;
%s would print multi-byte character strings.

> I need to further process the data and those libraries will need the
> data in UCS2 format (2 bytes), otherwise they fail.

Are you absolutely sure about that? Why does that library expect
UCS-2, when you system's wchar_t is four bytes?

In any case, do what MAL told you: use the UCS-2 codec to convert
the Unicode string to a 2-bytes-per-char byte string. The PyObject
you get from the conversion is a byte string object; use
PyString_AsStringAndSize to get to the actual bytes.

Regards,
Martin



More information about the Python-list mailing list