[Python-Dev] The C API and wide unicode support

Walter Dörwald walter@livinglogic.de
Wed, 10 Jul 2002 16:57:16 +0200


Michael Hudson wrote:

> It may be best to allow this particular dead horse to go on being
> dead, but I thought I'd ask here.  Beats work, anyway.
> 
> Picture the situation: you're wrapping a C library that returns a
> unicode string (let's say encoded as UCS-2).  You want to return this
> as a Python object.  So you'd think you can write
> 
> return PyUnicode_Decode(encstr, "ucs-2", NULL);

There is no "ucs-2" encoding. This should be "utf-16", "utf-16-le"
or "utf-16-be".

> (or something close to that).  But for reasons that escape me,
> PyUnicode_Decode is included in the API renaming in
> Include/unicodeobject.h, so if you want to provide binaries you have
> to provide two, and you can be sure that users will have no idea which
> they need.
> 
> So, questions:
> 
> (1) am I correct in thinking that PyUnicode_Decode (and a bunch of
>     others) could safely be omitted from the renaming?

No, because the unicode objects generated will consist of either
UCS-2 or UCS-4 "characters". This has nothing to do with the
encoding of the byte array which you use to create the unicode object.

Any C function that uses Unicode objects in any way needs name
mangling, because the storage layout of the Unicode objects
changes.

> (2) if so, is it worth omitting those APIs that could be omitted for 2.3?
> 
> This train of thinking came about because the version of 2.2 that
> comes with Redhat 7.3 is compiled with wide unicode support (which
> surprised me), and so the pygame RPMs broke.

I don't know, probably because sizeof(wchar_t)==4 ?

Bye,
    Walter Dörwald