[Python-Dev] The C API and wide unicode support
Walter Dörwald
walter@livinglogic.de
Wed, 10 Jul 2002 18:00:17 +0200
Michael Hudson wrote:
> =?ISO-8859-15?Q?Walter_D=F6rwald?= <walter@livinglogic.de> writes:
>
>>Guido van Rossum wrote:
>>
>>
>>>>Any C function that uses Unicode objects in any way needs name
>>>>mangling, because the storage layout of the Unicode objects
>>>>changes.
>>>
>>>
>>>Really? If I am only using the published APIs and not peeking
>>>directly inside the Unicode object, why should I care about its
>>>internal lay-out?
>>
>>That's what I meant with "using". Function that only pass
>>unicode objects around don't need to know (as long as they pass
>>the objects only to functions that themselves either "know"
>>or "don't need to know" the layout).
>>
>>PyUnicode_Decode creates unicode objects, so I guess it needs
>>to know.
>
> *It* needs to know, yes. But surely the caller doesn't?
This depends on what the caller does with the result of
PyUnicode_Decode.
>>>Shouldn't only functions whose signature uses PY_UNICODE_TYPE be
>>>name-mangled? What am I missing?
>>
>>What about the functions that use the C macros (PyUnicode_AS_UNICODE
>>etc.) directly or indirectly? Those functions will rely on the
>>internal lay-out.
>
> They're verboten in extension modules anyway, so I don't care.
I didn't know that. Neither Include/unicodeobject.h nor
Doc/api/concrete.tex mention it. Is there any other location
where this is mentioned?
I think to forbid the use of the macros is too restrictive.
What if I want to implement a version of
foo.replace(u"&", u"&")
.replace(u"<", u"<")
.replace(u"\"", u""")
.replace(u">", u">")
in C for performance reasons? How is this possible without
using the C macros?
And if extension modules are not allowed to access the internal
layout of unicode objects, what's the use of name mangling?
Bye,
Walter Dörwald