[Python-Dev] Python 3.x and bytes

Wed May 18 18:16:44 CEST 2011

Robert Collins writes:

 > Its probably too late to change, but please don't try to argue that
 > its correct: the continued confusion of folk running into this is
 > evidence that confusion *is happening*. Treat that as evidence and
 > think about how to fix it going forward.

Sorry, Rob, but you're just wrong here, and Nick is right.  It's
possible to improve Python 3, but not to "fix" it in this respect.
The Python 3 solution is correct, the Python 2 approach is not.
There's no way to avoid discontinuity and confusion here.

Confusion is indeed happening, but it's real confusion in the way
people think about the problem space, not a language design cockup.
The problem can't be solved by embedding ASCII in Unicode, because
non-ASCII bytes don't have a canonical embedding in Unicode.  Ie, the
situation is inherently confusing.  You can't wish it away, you can
only choose to impose more or less of it on particular constituencies.

Now, it's quite possible that there are other correct approaches that
allow straightforward manipulation of non-ASCII text, but I don't know
what they are, and I don't know anybody else who does.