Python's 8-bit cleanness deprecated?

Fri Feb 7 18:45:36 EST 2003

On Friday 07 February 2003 04:39 pm, Kirill Simonov wrote:
> But what is the price that we pay for this? The millions of Python
> scripts that use 8-bit string literals or comments are broken now in
> order to allow the feature that no one ever used! I think that this is
> an extreme.
>
> And I can propose a perfect solution. If there are no defined encoding
> for a source file, assume that it uses a simple 8-bit encoding. Do not
> convert the file into UTF-8 in the tokenizer. And do not convert string
> literals in the compiler. Raise SyntaxError if a non-ASCII character is
> contained in a Unicode literal. We will even save a few CPU cycles
> for most Python source files using this approach.

I *support* your patch, and I think that lots of people will do the same. It 
makes sense to keep the current behavior for existing files, while at the 
same extending the parser to support a more advanced encoding. It *does not* 
make sense to change the parser and break old code.

UTF-8 files should specify the encoding. Existing 8-bit files should be left 
as they are today. 

Carlos Ribeiro
cribeiro at mail.inet.com.br