[Python-ideas] Support Unicode code point notation

Stephen J. Turnbull stephen at xemacs.org
Sun Jul 28 09:24:56 CEST 2013


Greg Ewing writes:
 > Steven D'Aprano wrote:
 > > Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code 
 > > points go up to hex 10FFFF,
 > 
 > They do *now*, but we can't be sure that they will stay that
 > way in the future.

In Unicode, they will.  Blood was shed over the issue in the ISO 10646
committees before the standards could be unified.  Huge amounts of
software validate UTF-8 and UTF-16 including staying within the range,
and won't easily be converted to accept extended ranges.  So Unicode
and ISO 10646 will stay within the current 17 pages.  To go beyond
that they'll need a new standard.

In any case, it seems really unlikely that more than 1,000,000 code
points will ever be needed, unless there's a mutation that makes all
of *us* obsolete.

 > The Ruby \U{...} syntax has the following advantages:

So does the \N{U+XXXX} proposal, and it has the further advantage of
indicating the obvious semantics as a name for this character/code
point, which is consistent with the actual usage of the U+XXXX syntax
in the standard.



More information about the Python-ideas mailing list