Proposal: require 7-bit source str's

Hallvard B Furuseth h.b.furuseth at usit.uio.no
Fri Aug 6 03:06:33 EDT 2004


Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>> Now that the '-*- coding: <charset> -*-' feature has arrived,
>> I'd like to see an addition:
>> 
>>   # -*- str7bit:True -*-
>> 
>>   After the source file has been converted to Unicode, cause a parse
>>   error if a non-u'' string contains a non-7bit source character.
>> 
>> It can be used to ensure that the source file doesn't contain national
>> characters that the program will treat as characters in the current
>> locale's character set instead of in the source file's character set.
> 
> I doubt this helps as much as you'd like. You will need to change every
> source file with that annotation.

perl -i.bak -pe '
    /\bstr7bit\b/ or
    s/^(\s*#.*?-\*-.*?coding[=:]\s*[\w.-]+)(?=[;\s])/$1;str7bit:True/
' `find . -name '*.py' | xargs grep -l 'coding[=:]'`

> While you are at it, you could just
> as well check every source file directly.

True at first pass, but if Python catches it, a file will stay
clean once it has been cleaned up and marked as str7bit.  That's
particularly useful when several people are working on the source.

A fix to your objection would be to instead warn about the
offending strings _unless_ the file is marked with str7bit:False,
but I figure that's a bit too drastic for the time being:-)

> So if anything, I think this should be a global option.

-W::str7bitWarning?

Come to think of it, that would also make it possible for a Python
program to reject add-ons (modules, execfile etc) which contain
unmarked 8-bit strings.

> Or, better yet,
> external checkers like pychecker could check for that.

Well, I don't think that's better, but if it's rejected for Python
that'll be my next stop.

-- 
Hallvard



More information about the Python-list mailing list