[Python-Dev] The C API and wide unicode support

M.-A. Lemburg mal@lemburg.com
Wed, 10 Jul 2002 23:21:30 +0200


Guido van Rossum wrote:
>>>They're verboten in extension modules anyway, so I don't care.
>>
>>They are not disallowed in extensions... don't know where you
>>have that idea from.
> 
> 
> Maybe because other macros are often disallowed in (3rd party)
> extensions, the reason being that the macros dig in the internal
> representation which isn't guaranteed to be binary compatible?  It
> would make sense that the same rules applies to the Unicode macros in
> 3rd party extensions.

Which macros would that be ? I modelled the macros in the
Unicode implementation after those of the string
implementation. And those macros are certainly used in
a lot of 3rd party extensions.

> (I admit that these restrictions may be underdocumented.  Nevertheless
> they were intended and I believe they were discussed.)

I guess, having the macros in the header files without an
explicit warning marks them as public interface. That's how
I have used them in tons of code and I think that I'm not
alone in using this approach.

>>Note that the name mangling is done to prevent an extension
>>which uses Unicode in some way from loading if the interpreter
>>and extension Unicode "width" doesn't match.
>>
>>If we would allow this, extensions using the macros would cause
>>memory corruption since they'd index differently. That's not only
>>a potential cause for a seg fault, it's also a security risk.
>
> If there was a way so that only extensions that use the macros or
> APIs whose signature uses Py_UNICODE_TYPE would fail to load, that
> would be better.  But I don't know how to enforce that.

That's certainly possible for C API, but not for the macros
(without defeating their purpose). You also have a problem
in case the extension defines its own Unicode routines relying
on the Python types and macros, e.g. for extensions which
subclass the Unicode type. These don't necessarily need to
use the APIs; not even the macros... but they do rely on the
binary layout used in the Unicode type.

>>The name mangling does not provide a 100% bullet proof way
>>of preventing this (an extension might use Py_UNICODE and
>>the Unicode macros without touching any of the other C APIs),
>>but it goes a long way in that direction.
> 
> 
> Maybe it goes too far.
> 
> OTOH, Michael, is this really something you cannot live with?  Or is
> it simply a surprise?

I think that the fact that Michael is seeing breakage is
a good thing. Otherwise, he would probably not have noticed
that RedHat chose to use the wide build as default.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/