[Python-ideas] Support Unicode code point notation

Fri Aug 2 03:08:58 CEST 2013

On Sat, Jul 27, 2013 at 6:01 AM, Steven D'Aprano <steve at pearwood.info>wrote:

> Why do we need yet another way of writing escape sequences?
> ------------------------------**-----------------------------
>
> We don't need another one, we need a better one. U+xxxx is the standard
> Unicode notation, while existing Python escapes have various problems.
>

The current situation with \u and \U escapes can hardly qualify as an
obvious way to do it.  There is nothing obvious about either \u limitation
to four digits nor \U requirement to have eight.  (I remember discovering
that after first trying something like  \u1FFFF, then \U1FFFF and then
checking the reference manual to discover \U0001FFFF.   I don't think my
experience was unique.)

I have a counter-proposal that may improve the situation: allow 4, 5, 6 or
8 hex digits after \U optionally surrounded by braces. When used without
braces, maximal munch rule applies: the escape sequence ends at the first
non-hex-digit.  I would allow only upper-case A-F in 4-6 digits escapes to
minimize the need for braces.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130801/a741c252/attachment.html>