PEP 263 status check

Hallvard B Furuseth h.b.furuseth at usit.uio.no
Fri Aug 6 17:10:44 EDT 2004


An addition to Martin's reply:

John Roth wrote:
>"Martin v. Löwis" <martin at v.loewis.de> wrote in message
>news:41137799.70808 at v.loewis.de...
>>John Roth wrote:
>>
>> To be more specific: In an UTF-8 source file, doing
>>
>> print "ö" == "\xc3\xb6"
>> print "ö"[0] == "\xc3"
>>
>> would print two times "True", and len("ö") is 2.
>> OTOH, len(u"ö")==1.
>>
>>> The point of this is that I don't think that either behavior
>>> is what one would expect. It's also an open invitation
>>> for someone to make an unchecked mistake! I think this
>>> may be Hallvard's underlying issue in the other thread.
>>
>> What would you expect instead? Do you think your expectation
>> is implementable?
> 
> I'd expect that the compiler would reject anything that
> wasn't either in the 7-bit ascii subset, or else defined
> with a hex escape.

Then you should also expect a lot of people to move to
another language - one whose designers live in the real
world instead of your Utopian Unicode world.

> The reason for this is simply that wanting to put characters
> outside of the 7-bit ascii subset into a byte character string
> isn't portable.

Unicode isn't portable either.
Try to output a Unicode string to a device (e.g. your terminal)
whose character encoding is not known to the program.
The program will fail, or just output the raw utf-8 string or
something, or just guess some character set the program's author
is fond of.

For that matter, tell me why my programs should spend any time
on converting between UTF-8 and the character set the
application actually works with just because you are fond of
Unicode.  That might be a lot more time than just the time spent
parsing the program.  Or tell me why I should spell quite normal
text strings with hex escaping or something, if that's what you
mean.

And tell me why I shouldn't be allowed to work easily with raw
UTF-8 strings, if I do use coding:utf-8.

-- 
Hallvard



More information about the Python-list mailing list