[C++-sig] Some thoughts on py3k support

Haoyu Bai divinekid at gmail.com
Thu Mar 19 09:02:35 CET 2009


On Thu, Mar 19, 2009 at 6:40 AM, Ralf W. Grosse-Kunstleve
<rwgk at yahoo.com> wrote:
>
> I tried the code below with Python 2.x. For a given str or unicode object, it copies the
> bytes in memory (char*) to a list of 1-character strings. I'm getting
>
> "hello" =  ['h', 'e', 'l', 'l', 'o']
> u"hello" =  ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00']
> U"hello" =  ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00']
>
> on platforms with sizeof(PY_UNICODE_TYPE) = 2 and
>
> "hello" =  ['h', 'e', 'l', 'l', 'o']
> u"hello" =  ['h', '\x00', '\x00', '\x00', 'e', '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'o', '\x00', '\x00', '\x00']
> U"hello" =  ['h', '\x00', '\x00', '\x00', 'e', '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'l', '\x00', '\x00', '\x00', 'o', '\x00', '\x00', '\x00']
>
> on platforms with sizeof(PY_UNICODE_TYPE) = 4.
>
> Will the results be different using Python 3?

The result in Python 3 will be:

"hello" =  ['h', '\x00', 'e', '\x00', 'l', '\x00', 'l', '\x00', 'o', '\x00']
b"hello" =  ['h', 'e', 'l', 'l', 'o']

and u"hello" is invalid by then.

>
> I have quite a few C++ functions with const char* arguments, expecting one byte per character.
>
>> - convert char* and std::string to/from Python 3 unicode string.
>
> How would this work exactly?
> Is the plan to copy the unicode data to a temporary one-byte-per-character buffer?
>

Of course the default converter policy we planed is not to convert to
raw data buffer from unicode object via PyUnicode_AS_DATA(). The C-API
such as PyUnicode_AsUTF8String() and PyUnicode_AsEncodedString() will
be used to convert unicode to bytes and then convert to char* and
passed to your C++ function.

By default we would use PyUnicode_AsUTF8String(), and encoding could
be explicitly specified by a converter policy. That may keep most of
your code compatible.

Thanks!

-- Haoyu Bai


More information about the Cplusplus-sig mailing list