[Python-Dev] PEP 263 -- Python Source Code Encoding

Guido van Rossum guido@python.org
Tue, 26 Feb 2002 15:37:05 -0500


> > I missed this.  Why not default to ASCII like any decent programming
> > language does in the absence of an explicit encoding?
> 
> Jack had the same question. The simple answer is: we need this
> in order to maintain backward compatibility when we move to
> phase two of the implementation.
> 
> Here's the longer one:
> 
> ASCII is the standard encoding for Python keywords and identifiers. 
> There is no standard source code encoding for string literals. 
> Unicode literals are interpreted using 'unicode-escape' which 
> is an enhanced Latin-1 with escape semantics.
> 
> This makes Latin-1 the right choice:
> 
> * Unicode literals already use it today

But they shouldn't, IMO.

We should require an explicit encoding when more than ASCII is used,
and I'd like to enforce this.

> * As soon as we get to phase two of the implementation,
>   8-bit string literals will be have to make the round trip
>   raw binary -> Unicode -> raw binary and this only works
>   if you make Latin-1 the default.

Sorry, I don't understand what you're trying to say here.  Can you
explain this with an example?  Why can't we require any program
encoded in more than pure ASCII to have an encoding magic comment?  I
guess I don't understand why you mean by "raw binary".

Once you've explained it to me, the PEP should address this issue.

--Guido van Rossum (home page: http://www.python.org/~guido/)