[Python-Dev] more unicode: \U support?

Tim Peters tim_one@email.msn.com
Thu, 27 Jul 2000 16:57:58 -0400


[/F]
> would it be a good idea to add \UXXXXXXXX
> (8 hex digits) to 2.0?
>
> (only characters in the 0000-ffff range would
>  be accepted in the current version, of course).

[Tim agreed two msgs later; Guido agreed in private]

[MAL]
> I don't really get the point of adding \uXXXXXXXX

No:  Fredrik's suggestion is with an uppercase U.  He is not proposing to
extend the (lowercase) \u1234 notation.

> when the internal format used is UTF-16 with support for surrogates.
>
> What should \u12341234 map to in a future implementation ?
> Two Python (UTF-16) Unicode characters ?

\U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped
to UTF-16.

> See
>
> http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc
.html#100850
>
> for how Java defines \uXXXX...

Which I pushed for from the start, and nobody is seeking to change.

> We're following an industry standard here ;-)

\U12345678 is also an industry standard, but in a more recent language (than
Java) that had more time to consider the eventual implications of Unicode's
limitations.  We reserve the notation now so that it's possible to outgrow
Unicode later.