Proposal: require 7-bit source str's
Peter Otten
__peter__ at web.de
Fri Aug 6 02:18:30 EDT 2004
Hallvard B Furuseth wrote:
> Peter Otten wrote:
>>Hallvard B Furuseth wrote:
>>
>>> Now that the '-*- coding: <charset> -*-' feature has arrived,
>>> I'd like to see an addition:
>>>
>>> # -*- str7bit:True -*-
>>>
>>> After the source file has been converted to Unicode, cause a parse
>>> error if a non-u'' string contains a non-7bit source character.
>>
>> Could
>>
>> # -*- coding: ascii -*-
>>
>> be sufficient?
>
> No. It would be used together with coding: <non-ascii charset>. The
> point is to ensure that all non-ASCII strings are u'' strings instead
> of plain strings.
OK.
>> Why would you reintroduce ambiguity with your s-prefixed
>> strings?
>
> For programs that work with non-Unicode output devices or files and
> know which character set they use. Which is quite a lot of programs.
I'd say a lot of programs work with non-unicode, but many don't know what
they are doing - i. e. you cannot move them into an environment with a
different encoding (if you do they won't notice).
>> The long-term goal would be unicode throughout, IMHO.
>
> Whose long-term goal for what? For things like Internet communication,
> fine. But there are lot of less 'global' applications where other
> character encodings make more sense.
Here we disagree. Showing the right image for a character should be the job
of the OS and should safely work cross-platform. Why shouldn't I be able to
store a file with a greek or chinese name? I wasn't able to quote Martin's
surname correctly for the Python-URL. That's a mess that should be cleaned
up once per OS rather than once per user. I don't see how that can happen
without unicode (only). Even NASA blunders when they have to deal with
meters and inches.
> In any case, a language's both short-term and long-term goals should be
> to support current programming, not programming like it 'should be done'
> some day in the future.
Well, Python's integers already work like they 'should be done'. I'm no
expert, but I think Java is closer to the 'real thing' concerning strings.
Perl 6 is going for unicode, if only to overcome the limititations of their
operator set (they want the yen symbol as a zipping operator because it
looks like a zipper :-).
You have to make compromises and I think an external checker would be the
way to go in your case. If I were to add a switch to Python's string
handling it would be "all-unicode". But it may well be that I would curse
it after the first real-world use...
Peter
More information about the Python-list
mailing list