[Python-Dev] more unicode: \U support?
Tim Peters
tim_one@email.msn.com
Thu, 27 Jul 2000 16:57:58 -0400
[/F]
> would it be a good idea to add \UXXXXXXXX
> (8 hex digits) to 2.0?
>
> (only characters in the 0000-ffff range would
> be accepted in the current version, of course).
[Tim agreed two msgs later; Guido agreed in private]
[MAL]
> I don't really get the point of adding \uXXXXXXXX
No: Fredrik's suggestion is with an uppercase U. He is not proposing to
extend the (lowercase) \u1234 notation.
> when the internal format used is UTF-16 with support for surrogates.
>
> What should \u12341234 map to in a future implementation ?
> Two Python (UTF-16) Unicode characters ?
\U12345678 is C99's ISO 10646 notation; as such, it can't always be mapped
to UTF-16.
> See
>
> http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc
.html#100850
>
> for how Java defines \uXXXX...
Which I pushed for from the start, and nobody is seeking to change.
> We're following an industry standard here ;-)
\U12345678 is also an industry standard, but in a more recent language (than
Java) that had more time to consider the eventual implications of Unicode's
limitations. We reserve the notation now so that it's possible to outgrow
Unicode later.