PEP 263 comments
Piet van Oostrum
piet at cs.uu.nl
Wed Feb 27 05:53:08 EST 2002
>>>>> "Jason Orendorff" <jason at jorendorff.com> (JO) writes:
JO> Martin v. Loewis wrote:
>> To make some progress on PEP 263, I suggest that some of the open issues
>> are resolved as follows:
JO> Counter-proposal:
JO> - Comment syntax: none.
JO> - UTF-8 file signature: not supported.
JO> - Python source code encoding: must always be UTF-8.
There are still non utf-8 files around, and not everyone has a utf-8
editor.
JO> - Implementation: within the parser, everything's just
JO> ordinary UTF-8 bytes.
JO> - IDLE: always save UTF-8 unless otherwise directed.
JO> Advantage: simple, universal, easy, similar to what Java does.
Java does accept iso-latin-1 files as input. In fact on my machine (Mac
OSX) it doesn't even accept utf-8 files with the utf-8 signature. And
strings containing utf-8 are interpreted as just 8-bit characters, meaning
every byte is a character.
JO> No confusion about embedded 0x22 bytes in strings. Also,
JO> stylistically I prefer not to have a document specify its own
JO> encoding, or for comments to affect the meaning of a source
JO> file.
Which 0x22 bytes?
To a certain extend it is possible to autodetect if a file with 8-bit
characters (highest bit set) could be utf-8, but it is error-prone.
--
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl
More information about the Python-list
mailing list