Proposal: require 7-bit source str's

Thu Aug 5 17:15:39 EDT 2004

"Hallvard B Furuseth" <h.b.furuseth at usit.uio.no> wrote in message
news:HBF.20040805p736 at bombur.uio.no...
> Now that the '-*- coding: <charset> -*-' feature has arrived,
> I'd like to see an addition:
>
>   # -*- str7bit:True -*-
>
>   After the source file has been converted to Unicode, cause a parse
>   error if a non-u'' string contains a non-7bit source character.
>
> It can be used to ensure that the source file doesn't contain national
> characters that the program will treat as characters in the current
> locale's character set instead of in the source file's character set.
>
> An environment variable or command line option to set this for all
> files would also be very useful (and -*- str7bit:False -*- to override
> it), so one can easily check someone else's code for trouble spots.
>
> Possibly an s'' syntax or something would also be useful for non-
> Unicode strings that intentionally contain national characters.
>
> I dislike the '7bit' part of the name - it's misleading both because
> one can get 8-bit strings e.g. with the '\x<hex>' notation (a feature,
> not a bug) and because some 'valid' characters will be 8bit in
> character sets like EBCDIC.  However, I can't think of a better name.
>
> Comments?
> Has it been discussed before?

Is this even an issue? If you specify utf-8 as the character
set, I can't see how non-unicode strings could have
anything other than 7-bit ascii, for the simple reason that
the interpreter wouldn't know which encoding to use.
(of course, hex escapes would still be legal, as well as
constructed strings and strings read in and so forth.)

On the other hand, I don't know that it actually does it this
way, and PEP 263 seems to be completely uninformative
on the issue.

John Roth
>
> -- 
> Hallvard