[I18n-sig] Re: Strawman Proposal: Encoding Declaration V2

Paul Prescod paulp@ActiveState.com
Sun, 11 Feb 2001 14:31:12 -0800


Fredrik Lundh wrote:
> 
> > A source file with an encoding declaration must only use non-ASCII bytes
> > in places that can legally support Unicode characters. In Python 2.x the
> > only place is within a Unicode literal
> 
> make that "in a string literal".

Yes, I think you're right. If a person needs to get at a Latin 1
character in a string literal they should be able to do so using 

> if an encoding directive is present, the *entire* file should be
> assumed to use that encoding.  this applies to comments, 8-bit
> string literals, and 16-bit string literals.

I've backed off somewhat on having the file be pre-decoded in the short
term. My major conceptual problem is if we decode to Unicode-escaped
ASCII or something then we mess up the column numbers and the syntax
errors will not be right. We might really need to have a Unicode-aware
parser before we can do this...

 Paul Prescod