[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace
Ezio Melotti
report at bugs.python.org
Sun Oct 2 08:46:26 CEST 2011
Ezio Melotti <ezio.melotti at gmail.com> added the comment:
> The problem with official names is that they have things in them that
> you are not expected in names. Do you really and truly mean to tell
> me you think it is somehow **good** that people are forced to write
> \N{LINE FEED (LF)}
> Rather than the more obvious pair of
> \N{LINE FEED}
> \N{LF}
> ??
Actually Python doesn't seem to support \N{LINE FEED (LF)}, most likely because that's a Unicode 1 name, and nowadays these codepoints are simply marked as '<control>'.
> If so, then I don't understand that. Nobody in their right
> mind prefers "\N{LINE FEED (LF)}" over "\N{LINE FEED}" -- do they?
They probably don't, but they just write \n anyway. I don't think we need to support any of these aliases, especially if they are not defined in the Unicode standard.
I'm also not sure humans use \N{...}: you don't want to write
'R\N{LATIN SMALL LETTER E WITH ACUTE}sum\N{LATIN SMALL LETTER E WITH ACUTE}'
and you would need to look up the exact name somewhere anyway before using it (unless you know them by heart).
If 'R\xe9sum\xe9' or 'R\u00e9sum\u00e9' are too obscure and/or magic, you can always print() them and get 'Résumé' (or just write 'Résumé' directly in the source).
> All of the standards documents *talk* about things like LRO and ZWNJ.
> I guess the standards aren't "readable" then, right? :)
Right, I had to read down till the table with the meanings before figuring out what they were (and I already forgot it).
> The most persuasive use-case for user-defined names is for private-use
> area code points. These will never have an official name. But it is
> just fine to use them. Don't they deserve a better name, one that
> makes sense within your own program that uses them? Of course they do.
>
> For example, Apple has a bunch of private-use glyphs they use all the time.
> In the 8-bit MacRoman encoding, the byte 0xF0 represents the Apple corporate
> logo/glyph thingie of an apple with a bite taken out of it. (Microsoft
> also has a bunch of these.) If you upgrade MacRoman to Unicode, you will
> find that that 0xF0 maps to code point U+F8FF using the regular converter.
>
> Now what are you supposed to do in your program when you want a named character
> there? You certainly do not want to make users put an opaque magic number
> as a Unicode escape. That is always really lame, because the whole reason
> we have \N{...} escapes is so we don't have to put mysterious unreadable magic
> numbers in our code!!
>
> So all you do is
> use charnames ":alias" => {
> "APPLE LOGO" => 0xF8FF,
> };
>
> and now you can use \N{APPLE LOGO} anywhere within that lexical scope. The
> compiler will dutifully resolve it to U+F8FF, since all name lookups happen
> at compile-time. And it cannot leak out of the scope.
This is actually a good use case for \N{..}.
One way to solve that problem is doing:
apples = {
'APPLE': '\uF8FF',
'GREEN APPLE': '\U0001F34F',
'RED APPLE': '\U0001F34E',
}
and then:
print('I like {GREEN APPLE} and {RED APPLE}, but not {APPLE}.'.format(**apples))
This requires the format call for each string and it's a workaround, but at least is readable (I hope you don't have too many apples in your strings).
I guess we could add some way to define a global list of names, and that would probably be enough for most applications. Making it per-module would be more complicated and maybe not too elegant.
> People who write patterns without whitespace for cognitive chunking (plus
> comments for explanation) are wicked wicked wicked. Frankly I'm surprised
> Python doesn't require it. :)/2
I actually find those *less* readable. If there's something fancy in the regex, a comment *before* it is welcomed, but having to read a regex divided on several lines and remove meaningless whitespace and redundant comments just makes the parsing more difficult for me.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12753>
_______________________________________
More information about the Python-bugs-list
mailing list