[Python-Dev] Support for "wide" Unicode characters

Guido van Rossum guido@digicool.com
Mon, 02 Jul 2001 11:29:39 -0400


> Greg Ewing wrote:
> > 
> > > It so happened that the Unicode support was written to make it very
> > > easy to change the compile-time code unit size
> > 
> > What about extension modules that deal with Unicode strings?
> > Will they have to be recompiled too? If so, is there anything
> > to detect an attempt to import an extension module with an
> > incompatible Unicode character width?
> 
> That's a good question ! 
> 
> The answer is: yes, extensions which use Unicode will have to
> be recompiled for narrow and wide builds of Python. The question
> is however, how to detect cases where the user imports an
> extension built for narrow Python into a wide build and
> vice versa.
> 
> The standard way of looking at the API level won't help. We'd
> need some form of introspection API at the C level... hmm,
> perhaps looking at the sys module will do the trick for us ?!
> 
> In any case, this is certainly going to cause trouble one
> of these days...

Here are some alternative ways to deal with this:

(1) Use the preprocessor to rename all the Unicode APIs to get "Wide"
    appended to their name in wide mode.  This makes any use of a
    Unicode API in an extension compiled for the wrong Py_UNICODE_SIZE
    fail with a link-time error.  (Which should cause an ImportError
    for shared libraries.)

(2) Ditto but only rename the PyModule_Init function.  This is much
    less work but more coarse: a module that doesn't use any Unicode
    APIs (and I expect these will be a large majority) still would not
    be accepted.

(3) Change the interpretation of PYTHON_API_VERSION so that a low bit
    of '1' means wide Unicode.  Then you only get a warning (followed
    by a core dump when actually trying to use Unicode).

I mentioned (1) and (3) in an earlier post.

--Guido van Rossum (home page: http://www.python.org/~guido/)