[Python-Dev] The C API and wide unicode support

Walter Dörwald walter@livinglogic.de
Wed, 10 Jul 2002 18:00:17 +0200


Michael Hudson wrote:

> =?ISO-8859-15?Q?Walter_D=F6rwald?= <walter@livinglogic.de> writes:
> 
>>Guido van Rossum wrote:
>>
>>
>>>>Any C function that uses Unicode objects in any way needs name
>>>>mangling, because the storage layout of the Unicode objects
>>>>changes.
>>>
>>>
>>>Really?  If I am only using the published APIs and not peeking
>>>directly inside the Unicode object, why should I care about its
>>>internal lay-out?
>>
>>That's what I meant with "using". Function that only pass
>>unicode objects around don't need to know (as long as they pass
>>the objects only to functions that themselves either "know"
>>or "don't need to know" the layout).
>>
>>PyUnicode_Decode creates unicode objects, so I guess it needs
>>to know.
> 
> *It* needs to know, yes.  But surely the caller doesn't?

This depends on what the caller does with the result of
PyUnicode_Decode.

>>>Shouldn't only functions whose signature uses PY_UNICODE_TYPE be
>>>name-mangled?  What am I missing?
>>
>>What about the functions that use the C macros (PyUnicode_AS_UNICODE
>>etc.) directly or indirectly? Those functions will rely on the
>>internal lay-out.
> 
> They're verboten in extension modules anyway, so I don't care.

I didn't know that. Neither Include/unicodeobject.h nor
Doc/api/concrete.tex mention it. Is there any other location
where this is mentioned?

I think to forbid the use of the macros is too restrictive.
What if I want to implement a version of
    foo.replace(u"&", u"&amp;")
       .replace(u"<", u"&lt;")
       .replace(u"\"", u"&quot;")
       .replace(u">", u"&gt;")
in C for performance reasons? How is this possible without
using the C macros?

And if extension modules are not allowed to access the internal
layout of unicode objects, what's the use of name mangling?

Bye,
    Walter Dörwald