Unicode problem in ucs4

abhi abhigyan_agrawal at in.ibm.com
Wed Mar 25 09:51:31 EDT 2009


On Mar 24, 4:55 am, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3
> > \0s after a char, printf or wprintf is only printing one letter.
>
> No. printf indeed will see a terminating character. However, wprintf
> should correctly know that a wchar_t has four bytes per character,
> and print it correctly. Make sure to use %ls to print wchar_t arrays;
> %s would print multi-byte character strings.
>
> > I need to further process the data and those libraries will need the
> > data in UCS2 format (2 bytes), otherwise they fail.
>
> Are you absolutely sure about that? Why does that library expect
> UCS-2, when you system's wchar_t is four bytes?
>
> In any case, do what MAL told you: use the UCS-2 codec to convert
> the Unicode string to a 2-bytes-per-char byte string. The PyObject
> you get from the conversion is a byte string object; use
> PyString_AsStringAndSize to get to the actual bytes.
>
> Regards,
> Martin

Thanks Marc and Martin, my preliminary trials are showing positive
results with this method.

-
Abhigyan



More information about the Python-list mailing list