Language design

Ben Finney ben+python at benfinney.id.au
Wed Sep 11 20:57:04 EDT 2013


Mark Janssen <dreamingforward at gmail.com> writes:

> > Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
> > word "encod[e]", which is the standard way to turn Unicode into bytes
> > anyway. No, a Unicode string is a series of codepoints - it's most
> > similar to a list of ints than to a stream of bytes.
>
> Okay, now you're in blah, blah land.

Text is (in the third millennium) Unicode.

Unicode text is not binary data and never will be.

Unicode text can be *encoded* to binary data, and that data can be
*decoded* back to Unicode text. The two are never the same thing.

You're demonstrating my point: the pernicious “text is binary data”
falsehood needs to be eradicated from everything today's programmers
learn. We need the simple facts about the basic difference between text
and bytes to be learned by every programmer as early as can feasible.

-- 
 \                   德不孤、必有鄰。 (The virtuous are not abandoned, |
  `\                               they shall surely have neighbours.) |
_o__)                             —孔夫子 Confucius, 551 BCE – 479 BCE |
Ben Finney




More information about the Python-list mailing list