unicodedata name for \u000a

"Martin v. Löwis" martin at v.loewis.de
Sun Aug 22 05:39:10 EDT 2004


Tor Iver Wilhelmsen wrote:
> 000A LF <control>
> = LINE FEED (LF)
> 
> So the authors of unicodedata.name() could have picked either
> '<control>', the ASCII name 'LF' or the alternative 'LINE FEED (LF)'.

No. <control> is not a character name. The unicodedata.name function
returns the official character name, so it MUST NOT return an alias
(which rules out your second alternative).

> Not picking any of them seems strange, and as the OP pointed out,
> leads to an error even though the "C0 Controls" part of that page *is*
> part of Unicode.

Yes. However, this strangeness originates from the Unicode
specification. Control characters simply do not have a name.

If you want to know whether a code point is an unassigned character,
check whether unicodedata.type is "Cn".

Regards,
Martin



More information about the Python-list mailing list