Proposal: require 7-bit source str's

Fri Aug 6 02:18:30 EDT 2004

Hallvard B Furuseth wrote:

> Peter Otten wrote:
>>Hallvard B Furuseth wrote:
>> 
>>> Now that the '-*- coding: <charset> -*-' feature has arrived,
>>> I'd like to see an addition:
>>> 
>>>   # -*- str7bit:True -*-
>>> 
>>>   After the source file has been converted to Unicode, cause a parse
>>>   error if a non-u'' string contains a non-7bit source character.
>> 
>> Could
>> 
>> # -*- coding: ascii -*-
>> 
>> be sufficient?
> 
> No.  It would be used together with coding: <non-ascii charset>.  The
> point is to ensure that all non-ASCII strings are u'' strings instead
> of plain strings.

OK.

>> Why would you reintroduce ambiguity with your s-prefixed
>> strings?
> 
> For programs that work with non-Unicode output devices or files and
> know which character set they use.  Which is quite a lot of programs.

I'd say a lot of programs work with non-unicode, but many don't know what
they are doing - i. e. you cannot move them into an environment with a
different encoding (if you do they won't notice).

>> The long-term goal would be unicode throughout, IMHO.
> 
> Whose long-term goal for what?  For things like Internet communication,
> fine.  But there are lot of less 'global' applications where other
> character encodings make more sense.

Here we disagree. Showing the right image for a character should be the job
of the OS and should safely work cross-platform. Why shouldn't I be able to
store a file with a greek or chinese name? I wasn't able to quote Martin's
surname correctly for the Python-URL. That's a mess that should be cleaned
up once per OS rather than once per user. I don't see how that can happen
without unicode (only). Even NASA blunders when they have to deal with
meters and inches.

> In any case, a language's both short-term and long-term goals should be
> to support current programming, not programming like it 'should be done'
> some day in the future.

Well, Python's integers already work like they 'should be done'. I'm no
expert, but I think Java is closer to the 'real thing' concerning strings.
Perl 6 is going for unicode, if only to overcome the limititations of their
operator set (they want the yen symbol as a zipping operator because it
looks like a zipper :-). 
You have to make compromises and I think an external checker would be the
way to go in your case. If I were to add a switch to Python's string
handling it would be "all-unicode". But it may well be that I would curse
it after the first real-world use...

Peter