[I18n-sig] Unicode surrogates: just say no!

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 29 Jun 2001 00:48:25 +0200


> > The rationale for supporting \U is two-fold: One, importing a module
> > should not fail in one installation, and succeed in another (of the
> > same Python version). Running the module may give different results,
> > but you should be able to generate byte code. 
> 
> Isn't it already the case that big Python integer literals can be legal
> on one platform and illegal on another? (I don't know, I just thought
> that was the case....)

I guess so; I'm not even sure you can exchange byte code files across
machines with sizeof(long).

OTOH, I think this is a real problem, and we should not extend this
problem into other areas as well. Furthermore, if you encounter a
source incompatibility between installations because of very large
integers, you can switch to long integers with little effort. The same
is not that easy for Unicode literals.

> What are the chances that anybody is in this situation in the near
> future? Can you even display these characters on Windows? Does Tk
> support them? And if so, on what platforms? 

I'm pretty sure that Tk can display them soon after fonts become
available. I believe the X11 fonts support full ISO 10646. Since Tk
uses UTF-8, it is also capable of representing these characters
internally. For Windows, I don't know the power of TrueType/OpenType
in this respect, but I'd assume they have thought of UTF-16 already.

As for the fonts themselves, I've seen PDF files for the plane 2
characters, so I guess fonts are available *somehwere*.

> What about the Java APIs?

I could not care less about the Unicode capabilities of Java.

> Wide Python builds may be the "default" before these characters become
> practically usable in GUIs.

That would be a good thing, since I think infrastructures need to
build from ground up (operating system, programming language, GUI
libraries, applications).

Given that it is much easier to support representing the characters in
Python than producing a font, it seems only natural that Python can
represent them first. Python won't have a lot of other facilities
needed for processing them (like character properties, combining,
sorting, etc), but the representation should work fairly early.

Regards,
Martin