Python's 8-bit cleanness deprecated?

Tue Feb 4 13:22:17 EST 2003

In article <mailman.1044380830.22886.python-list at python.org>,
 Jeff Epler <jepler at unpythonic.net> wrote:

> On Tue, Feb 04, 2003 at 01:36:04PM +0100, Just wrote:
> > Here's a possible compromise (which I'm not sure is implementable at 
> > all): Python could only issue warnings if 8-bit chars are used in string 
> > literals, and not if they only occur in comments.
> 
> What makes you believe that Python can tell what is a comment and what
> is a string without knowing the encoding?

This is not about knowing the encoding but about warning when an 
encoding _should_ have been specified. Since whatever the encoding is, 
it must be a superset of ASCII I don't see why my suggestion wouldn't 
work (bar implementation limitations). That's not so say I'm completely 
convinced of the idea myself.

> I think the only limitation of the source file encoding is that it must
> be an ASCII superset.  So for instance I could have a perverse encoding
> where 0x81 decodes to u'\n', and 0x83 is another valid character in the
> encoding
> 's'.  Then this byte string
>     '#\x81"\x83"\x81'
> actually decodes to
>     u'#\n"\uXXXX"\n"
> which means the file contains a string with high-bit-set chars used in
> a string literal.

I don't see your point: my suggestion is about reducing the warning 
irritation for people using 8-bit encodings in comments of code that 
works *now* (in Python <= 2.2), not about bizarre things you _could_ do 
with perverse encoding directives in 2.3.

Just