Differences between \N escapes and unicodedata

eryk sun eryksun at gmail.com
Sat Aug 6 00:25:26 EDT 2016


On Sat, Aug 6, 2016 at 3:13 AM, Chris Angelico <rosuav at gmail.com> wrote:
>>>> unicodedata.lookup("NULL")
> '\x00'
>>>> "\N{NULL}"
> '\x00'
>>>> unicodedata.name(_)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: no such name
>
> Tested on 3.4, 3.5, and 3.6. Extremely odd.

U+0000 has a legacy name and alias names in the standard, but no primary name:

http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt
http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt

lookup() includes the aliases from the private use area where Python
maps them (U+F0000 - U+F01CB), and of course maps it back to the
correct character code.

For the following I hacked unicodedata.name() to allow returning names
for the alias range. Notice that there are multiple aliases for a
given character, straight from the above-mentioned NameAliases
database.

    >>> names = [unicodedata.name(chr(i)) for i in range(0xf0000, 0xf01cb)]
    >>> print(*textwrap.wrap(', '.join(names[:80])), sep='\n')
    NULL, NUL, START OF HEADING, SOH, START OF TEXT, STX, END OF TEXT,
    ETX, END OF TRANSMISSION, EOT, ENQUIRY, ENQ, ACKNOWLEDGE, ACK, ALERT,
    BEL, BACKSPACE, BS, CHARACTER TABULATION, HORIZONTAL TABULATION, HT,
    TAB, LINE FEED, NEW LINE, END OF LINE, LF, NL, EOL, LINE TABULATION,
    VERTICAL TABULATION, VT, FORM FEED, FF, CARRIAGE RETURN, CR, SHIFT
    OUT, LOCKING-SHIFT ONE, SO, SHIFT IN, LOCKING-SHIFT ZERO, SI, DATA
    LINK ESCAPE, DLE, DEVICE CONTROL ONE, DC1, DEVICE CONTROL TWO, DC2,
    DEVICE CONTROL THREE, DC3, DEVICE CONTROL FOUR, DC4, NEGATIVE
    ACKNOWLEDGE, NAK, SYNCHRONOUS IDLE, SYN, END OF TRANSMISSION BLOCK,
    ETB, CANCEL, CAN, END OF MEDIUM, EOM, SUBSTITUTE, SUB, ESCAPE, ESC,
    INFORMATION SEPARATOR FOUR, FILE SEPARATOR, FS, INFORMATION SEPARATOR
    THREE, GROUP SEPARATOR, GS, INFORMATION SEPARATOR TWO, RECORD
    SEPARATOR, RS, INFORMATION SEPARATOR ONE, UNIT SEPARATOR, US, SP,
    DELETE, DEL



More information about the Python-list mailing list