[Python-ideas] Support Unicode code point notation

Bruce Leban bruce at leapyear.org
Fri Aug 2 02:04:56 CEST 2013


On Thu, Aug 1, 2013 at 4:55 PM, Alexander Belopolsky <
alexander.belopolsky at gmail.com> wrote:

> On Thu, Aug 1, 2013 at 7:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> I'd never even heard of code point labels before this thread, while the
>> "U+" notation is incredibly common.
>
>
> <snip>
>
> The original proposal was to allow \U+NNNN escape as a shortcut for
> \U0000NNNN.  This is a clear readability improvement while \N{U+001B}, for
> example,  is not an improvement over \N{ESCAPE}.  However, for more obscure
> control characters, \N{control-NNNN} may be clearer than any currently
> available spelling.  For example, \N{control-001E} is easier to
> understand than \036, \x1e, \u001E, \N{RS} or even the most verbose
> \N{INFORMATION SEPARATOR TWO}.
>

My reason to suggest including it is that it's in the standard as the label
for these characters so it's reasonable to expect lookup to know about
these labels just as it knows about 'EXCLAMATION MARK'. If someone has
created data using the standard and passes it to unicode.lookup, it should
work. I'm +/-0 on having 'control-' and 'reserved-' etc. simply being
different spellings of 'U+' so that '\N{control-0021}' == '\N{U+0021}' ==
'\x21' == '!' even though that isn't a control character. That is, if the
data doesn't conform to the standard, it wouldn't necessarily be terrible
if it did something reasonable rather than raising an exception.

And, I'm only suggesting this be supported on the reading side.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130801/8b87dff4/attachment.html>


More information about the Python-ideas mailing list