[Python-Dev] Some thoughts on the codecs...

Tim Peters tim_one@email.msn.com
Tue, 16 Nov 1999 02:47:06 -0500


[Guido]
> ...
> While I'm on the topic, I don't see in your proposal a description of
> the source file character encoding.  Currently, this is undefined, and
> in fact can be (ab)used to enter non-ASCII in string literals.
> ...
> What should we do about this?  The safest and most radical solution is
> to disallow non-ASCII source characters; François will then have to
> type
>
>   print u"Written by Fran\u00E7ois."
>
> but, knowing François, he probably won't like this solution very much
> (since he didn't like the \347 version either).

So long as Python opens source files using libc text mode, it can't
guarantee more than C does:  the presence of any character other than tab,
newline, and ASCII 32-126 inclusive renders the file contents undefined.

Go beyond that, and you've got the same problem as mailers and browsers, and
so also the same solution:  open source files in binary mode, and add a
pragma specifying the intended charset.

As a practical matter, declare that Python source is Latin-1 for now, and
declare any *system* that doesn't support that non-conforming <wink>.

python-is-the-measure-of-all-things-ly y'rs  - tim