[Python-Dev] Support for "wide" Unicode characters

M.-A. Lemburg mal@lemburg.com
Mon, 02 Jul 2001 18:51:58 +0200


Guido van Rossum wrote:
> 
> > Greg Ewing wrote:
> > >
> > > > It so happened that the Unicode support was written to make it very
> > > > easy to change the compile-time code unit size
> > >
> > > What about extension modules that deal with Unicode strings?
> > > Will they have to be recompiled too? If so, is there anything
> > > to detect an attempt to import an extension module with an
> > > incompatible Unicode character width?
> >
> > That's a good question !
> >
> > The answer is: yes, extensions which use Unicode will have to
> > be recompiled for narrow and wide builds of Python. The question
> > is however, how to detect cases where the user imports an
> > extension built for narrow Python into a wide build and
> > vice versa.
> >
> > The standard way of looking at the API level won't help. We'd
> > need some form of introspection API at the C level... hmm,
> > perhaps looking at the sys module will do the trick for us ?!
> >
> > In any case, this is certainly going to cause trouble one
> > of these days...
> 
> Here are some alternative ways to deal with this:
> 
> (1) Use the preprocessor to rename all the Unicode APIs to get "Wide"
>     appended to their name in wide mode.  This makes any use of a
>     Unicode API in an extension compiled for the wrong Py_UNICODE_SIZE
>     fail with a link-time error.  (Which should cause an ImportError
>     for shared libraries.)
>
> (2) Ditto but only rename the PyModule_Init function.  This is much
>     less work but more coarse: a module that doesn't use any Unicode
>     APIs (and I expect these will be a large majority) still would not
>     be accepted.
> 
> (3) Change the interpretation of PYTHON_API_VERSION so that a low bit
>     of '1' means wide Unicode.  Then you only get a warning (followed
>     by a core dump when actually trying to use Unicode).
>
> I mentioned (1) and (3) in an earlier post.

(4) Add a feature flag to PyModule_Init() which then looks up the
    features in the sys module and uses this as basis for
    processing the import requrest.

In this case, I think that (5) would be the best solution,
since old code will notice the change in width too.

-- 
Marc-Andre Lemburg
________________________________________________________________________
Business:                                        http://www.lemburg.com/
Python Pages:                             http://www.lemburg.com/python/