[I18n-sig] Support for "wide" Unicode characters

Guido van Rossum guido@digicool.com
Thu, 28 Jun 2001 07:33:25 -0400


> >     There is a new (experimental) define:
> > 
> >         #define PY_UNICODE_SIZE 2
> 
> Doesn't sizeof(Py_UNICODE) do the same ?

Not on a Cray!  And not in the C standard.  Ask Tim. :-)

> This introduces an incompatibility between narrow and wide
> builds at run-time. PYC should not be harmed by this since they
> store Unicode strings using UTF-8.

Does UTF-8 transfer isolated surrogates correctly?  I think that's
necessary, otherwise I can't marshal or unmarshal literals containing
those, which means that .pyc files for .py files containing those
can't be read (on maybe aren't portable between wide and narrow
interpreters).

Note that I'm OK with the UTF-8 encoder recognizing hi+lo surrogate
pairs and encoding them as one Unicode character, since the decoder
generates surrogates for non-BMP characters on a narrow platform.

--Guido van Rossum (home page: http://www.python.org/~guido/)