[Python-ideas] Unicode Name Aliases keyword argument abbreviation in unicodedata.name for missing names

Steven D'Aprano steve at pearwood.info
Thu Jul 12 13:27:04 EDT 2018


On Thu, Jul 12, 2018 at 03:11:59PM +0000, Robert Vanden Eynde wrote:

[Stephen] 
> I don't understand what you're asking for.  The Unicode Standard
> already provides canonical names.
> 
> Not for control characters.

That's because the Unicode Consortium considers that control characters 
have no canonical name. And I think that they are right.


> About the Han case, they all have a 
> unicodedata.name<http://unicodedata.name> don't they ? (Sorry if I 
> misread your message)

I think that the point Stephen is making is that the canonical name for 
most Han characters is terribly uninformative, even to native Han users. 
For Englishg speakers, the analogous situation would be if name("A") 
returned "LATIN CAPITAL LETTER 0041".

There are good reasons for that, but it does mean that if your intention 
is to report the name of the character to a non-technical end-user, in 
their own native language, using the Unicode name or even any of the 
aliases is probably not a great solution.

On the other hand, if you are in a lucky enough situation (unlike 
Stephen) of being able to say "Han characters? We'll fix that in the 
next version..." using the Unicode name is not a terrible solution.

At least, it's The Standard terrible solution *wink*


-- 
Steve


More information about the Python-ideas mailing list