[I18n-sig] How does Python Unicode treat surrogates?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 01:26:51 +0200


> I don't think switching to a 32-bit character is the right thing to do
> for us (although I think it should be easier than it currently is --
> changing to define Py_UNICODE as a 32-bit unsigned int should be all
> that it takes, which is currently not the case).
> 
> I'm all for taking the lazy approach and letting applications that
> need surrogate support do it themselves, at the application level.

That, of course, means that you cast in stone the 16-bit
Py_UNICODE. In a 32-bit Py_UNICODE, unichr(0xd800) would be surely
illegal, wouldn't it? So an application that explicitly creates
surrogates using unichr (how else would it do that?) won't be portable
to a 32-bit Py_UNICODE.

Would you accept patches that deal with surrogate pairs transparently
throughout the implementation, in the sense of mapping them to
ordinals above 0x10000?

Regards,
Martin