[I18n-sig] Python Support for "Wide" Unicode characters

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 28 Jun 2001 08:57:34 +0200


> Maybe there is a virtue in having a way to both ask for the largest
> *legal* Unicode character and the largest character that will fit into a
> Python character on the platform. I mean in theory the maximum Unicode
> character is constant but that doesn't mean I want to declare it in my
> programs explicitly.
> 
> unicodedata.maxchar => always TOPCHAR
> sys.maxunicode => some power of 2 - 1
> 
> I'm not entirely happy that we call a thing "sys.maxunicode" and then
> tell people how to generate larger values. How about sys.maxcodeunit .
> (or we could remove the whole surrogate building stuff :) )

-1. The Unicode consortium and ISO have promised that there will never
be characters above 0x10ffff. Most of the characters below TOPCHAR are
"unassigned", whereas the ones above TOPCHAR are "illegal" (or not
even representable in UTF-16).

If we were to allow putting very large numbers into Unicode strings,
we'd have to check for them in all codecs also. I'd rather disallow
them from Python code, and declare using them in C as undefined
behaviour.

> So there is no way to get the heuristic of "wchar_t if available, UCS-4
> if not". I'm not complaining, just checking. The list of options is just
> two with ucs2 the default.

I'd be complaining, though, if I wasn't that pleased with this PEP
overall.

Regards,
Martin