Python's 8-bit cleanness deprecated?

Scott David Daniels Scott.Daniels at Acm.Org
Tue Feb 4 15:41:51 EST 2003


Roman Suzi wrote:
> ...
> There is no ambiguity in raw 8-bit. What if I have no text at all,
> just some bytes with value > 127?

But raw 8-bit is about _bytes_, and the issue is characters.  As
I imagine it, a raw 8-bit encoding would allow anything in "normal"
strings, but only ASCII in unicode strings.  That is really the
worst of all possible worlds.  If you've declared the encoding,
the compiler can "know" that the value of the expression:
     ord("?") + ord(u"?")
Otherwise, it does not have a chance.

> Let's make -*- necessary for ASCII as well - and watch at the 
> reaction of Python users ;)

The trick is to allow a system in which you can read and interpret
the first few lines (I think we've settled on 2) in order to get
the -*- line understood.  UTF-8 would be the default if it weren't
so western-european-centric (Talk to Chinese or Japanese programmers
about how efficient UTF-8 is).

-Scott David Daniels
-Scott.Daniels at Acm.Org





More information about the Python-list mailing list