[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

Mikhail V mikhailwas at gmail.com
Thu Dec 8 13:37:12 EST 2016


On 8 December 2016 at 17:52, Chris Angelico <rosuav at gmail.com> wrote:

> In the first place, many people have pointed out to you that Unicode
> *is* laid out best in hexadecimal.

Ok if it is aligned intentionally on binary grid obviously
hex numbers will show some patterns, but who argues?

And to be fair, from my examples for Cyrillic:
Range start points in hex vs decimal:

capitals:
U+0410    #1040
lowercase:
U+0430    #1072

So I need one number 1040 to remember, then if I know if
it is 32 letters (except Ё) I just sum 1040 + 32 and get 1072,
and this will be the beginning of lowercase range,
there are of course people who can efficiently sum and
substract in head with hex, but I am not the one
(guess who is in minority here), and there is no need to do
it in this case. So if I know distances between ranges
I can do it all much easier in head.

Not a strong argument?
To be more pedantic, if you know the fact that in Russian
alphabet there are exactly 33 letters and not 32 as one
could suggest from unicode table, you could have
notice also that: letter Ё is U+0401, and ё is U+0451

This means they are torn away from other letters and
does not even lie in the range. In practice, this means
if I want to filter against code ranges, I need to
additionally check the value U+0451 and U+0401.
Is it not because someone decided to align
the alphabet in such a way?  Alignment is not bad idea,
but it should not contradict with common sense.

> You have to show
> that decimal isn't just marginally better than hex; you have to show
> that there are situations where the value of decimal character
> literals is so great that it's worth forcing everyone to learn two
> systems. And I'm not convinced you've even hit the first point.

Frankly I don't fully understand your point here. Everyone knows
decimal, address of an element in a table is a number, in most
cases I don't need to learn it by heart, since it is already
known and written in some table on your PC.

Also inputting characters by decimal is very common thing,
alternates key combos (Alt+0192) is something very well
established and many people *do* learn decimal code
points by heart, including me. So now it is you who
want me to learn two numbering systems for no reason.

And even with all that said, it is not the strongest argument.
Most important is that hex notation is an ugly circumstance,
and in this case there is too little reason to introduce it
in the algorithm which just checks the ranges and specific
values. And for *specific single* values it is absolutely
irrelevant which alignment do you have.
You just choose what is better readable and/or common
for abstract numbers. But that is other big question, and
current hex notation does not fall into category
"better readable" anyway.


Mikhail


More information about the Python-ideas mailing list