[Python-ideas] Proposal for default character representation

Mikhail V mikhailwas at gmail.com
Wed Oct 12 17:33:11 EDT 2016


Hello all,

I want to share my thoughts about syntax improvements regarding
character representation in Python.
I am new to the list so if such a discussion or a PEP exists already,
please let me know.

So in short:

Currently Python uses hexadecimal notation
for characters for input and output.
For example let's take a unicode string "абв.txt"
(a file named with first three Cyrillic letters).

Now printing  it we get:

u'\u0430\u0431\u0432.txt'

So one sees that we have hex numbers here.
Same is for typing in the strings which obviously also uses hex.
Same is for some parts of the Python documentation,
especially those about unicode strings.

PROPOSAL:
1. Remove all hex notation from printing functions, typing,
documention.
So for printing functions leave the hex as an "option",
for example for those who feel the need for hex representation,
which is strange IMO.
2. Replace it with decimal notation, in this case e.g:

u'\u0430\u0431\u0432.txt' becomes
u'\u1072\u1073\u1074.txt'

and similarly for other cases where raw bytes must be printed/inputed
So to summarize: make the decimal notation standard for all cases.
I am not going to go deeper, such as what digit amount (leading zeros)
to use, since it's quite secondary decision.

MOTIVATION:
1. Hex notation is hardly readable. It was not designed with readability
in mind, so for reading it is not appropriate system, at least with the
current character set, which is a mix of digits and letters (curious who
was that wize person who invented such a set?).
2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
I hope no need to explain why.

So that's it, in short.
Feel free to discuss and comment.

Regards,
Mikhail


More information about the Python-ideas mailing list