[Python-ideas] Support Unicode code point notation

Nick Coghlan ncoghlan at gmail.com
Sun Jul 28 11:47:08 CEST 2013


On 28 July 2013 19:05, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Steven D'Aprano writes:
>  > On 28/07/13 17:41, Stephen J. Turnbull wrote:
>  > >   > (Sorry, I have forgotten who made that suggestion originally.) That
>  > >   > could be extended to allow multiple space-separated code points:
>  > >   >
>  > >   > \N{U+xxxx U+yyyy U+zzzzz}
>  > >   >
>  > >   > or
>  > >   >
>  > >   > \N{U+xxxx yyyy zzzzz}
>  > >
>  > > This is a modal encoding, which has proved to be a really bad idea in
>  > > its past incarnations.  I hope that extension is never added to
>  > > Python.
>  >
>  > Could you elaborate please? What do you mean "modal encoding", and
>  > what past incarnations are you referring to?
>
> A "modal encoding" is one in which the same combination of code units
> (here, ASCII characters) is interpreted differently depending on
> arbitrarily distant context.

Ah, I had missed the "arbitrarily distant" sense you intended for
modal encoding. Agreed, the fact that unicode escapes (including \N{})
are limited in length to a single code point is a definite win in that
regard.

Cheers,
Nick.

P.S. It occurs to me that the str.format mini-language has no such
limitation, though:

>> def hexchr(x):
...     return chr(int(x, 16))
...
>>> def hex2str(s):
...     return "".join(hexchr(x) for x in s.split())
...
>>> class chrformat:
...     def __format__(self, fmt):
...         return hex2str(fmt)
...
>>> "{:40 60 1234 e9}".format(chrformat())
'@`ሴé'

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list