Python's 8-bit cleanness deprecated?
Carlos Ribeiro
cribeiro at mail.inet.com.br
Fri Feb 7 18:45:36 EST 2003
On Friday 07 February 2003 04:39 pm, Kirill Simonov wrote:
> But what is the price that we pay for this? The millions of Python
> scripts that use 8-bit string literals or comments are broken now in
> order to allow the feature that no one ever used! I think that this is
> an extreme.
>
> And I can propose a perfect solution. If there are no defined encoding
> for a source file, assume that it uses a simple 8-bit encoding. Do not
> convert the file into UTF-8 in the tokenizer. And do not convert string
> literals in the compiler. Raise SyntaxError if a non-ASCII character is
> contained in a Unicode literal. We will even save a few CPU cycles
> for most Python source files using this approach.
I *support* your patch, and I think that lots of people will do the same. It
makes sense to keep the current behavior for existing files, while at the
same extending the parser to support a more advanced encoding. It *does not*
make sense to change the parser and break old code.
UTF-8 files should specify the encoding. Existing 8-bit files should be left
as they are today.
Carlos Ribeiro
cribeiro at mail.inet.com.br
More information about the Python-list
mailing list