Proposal: require 7-bit source str's

Hallvard B Furuseth h.b.furuseth at usit.uio.no
Thu Aug 5 16:24:19 EDT 2004


Now that the '-*- coding: <charset> -*-' feature has arrived,
I'd like to see an addition:

  # -*- str7bit:True -*-

  After the source file has been converted to Unicode, cause a parse
  error if a non-u'' string contains a non-7bit source character.

It can be used to ensure that the source file doesn't contain national
characters that the program will treat as characters in the current
locale's character set instead of in the source file's character set.

An environment variable or command line option to set this for all
files would also be very useful (and -*- str7bit:False -*- to override
it), so one can easily check someone else's code for trouble spots.

Possibly an s'' syntax or something would also be useful for non-
Unicode strings that intentionally contain national characters.

I dislike the '7bit' part of the name - it's misleading both because
one can get 8-bit strings e.g. with the '\x<hex>' notation (a feature,
not a bug) and because some 'valid' characters will be 8bit in
character sets like EBCDIC.  However, I can't think of a better name.

Comments?
Has it been discussed before?

-- 
Hallvard



More information about the Python-list mailing list