Python's 8-bit cleanness deprecated?

Kirill Simonov kirill_simonov at mail.ru
Fri Feb 7 11:39:56 EST 2003


* M.-A. Lemburg <mal at lemburg.com>:
> No, but they'll need to pay some lucky Python programmer to
> get rid off the warning :-) Seriously, the warning and the trouble
> are intended as I already mentioned in the bug report Kirill
> filed on SF: http://www.python.org/sf/681960/ :

Sorry, but I'm not convinced. I hope you still have patience to 
hear my objections.

I've inspected the current implementation. The file encoding does not
affect ordinary string literals. At first the tokenizer converts them
into UTF-8 from the file encoding. Then the compiler converts them back
from UTF-8 to the file encoding. Thus the result is the same regardless
of what encoding you use. The comments are tossed out by the tokenizer
too. Why do you want them to be in any particular encoding if their
encoding doesn't matter?

Well, I understand. The file encoding is defined for the whole file.
So comments and string literals must be in this encoding too.
And that way we can define Unicode literals using our favourite encoding.

But what is the price that we pay for this? The millions of Python
scripts that use 8-bit string literals or comments are broken now in
order to allow the feature that no one ever used! I think that this is
an extreme.

And I can propose a perfect solution. If there are no defined encoding
for a source file, assume that it uses a simple 8-bit encoding. Do not
convert the file into UTF-8 in the tokenizer. And do not convert string
literals in the compiler. Raise SyntaxError if a non-ASCII character is
contained in a Unicode literal. We will even save a few CPU cycles
for most Python source files using this approach.

I will write a patch if you agree with this solution.

> This whole thing is one more step in the direction of
> explicit is better than implicit and opens up Python
> for many more languages such as, for example, Asian
> scripts.

If you need a pythonic quote, it is here
    "Practicality beats purity"

-- 
xi





More information about the Python-list mailing list