PEP 263 comments

Wed Feb 27 05:53:08 EST 2002

>>>>> "Jason Orendorff" <jason at jorendorff.com> (JO) writes:

JO> Martin v. Loewis wrote:
>> To make some progress on PEP 263, I suggest that some of the open issues
>> are resolved as follows:

JO> Counter-proposal:

JO>  - Comment syntax: none.
JO>  - UTF-8 file signature: not supported.
JO>  - Python source code encoding: must always be UTF-8.

There are still non utf-8 files around, and not everyone has a utf-8
editor.

JO>  - Implementation: within the parser, everything's just
JO>    ordinary UTF-8 bytes.
JO>  - IDLE: always save UTF-8 unless otherwise directed.

JO> Advantage: simple, universal, easy, similar to what Java does.

Java does accept iso-latin-1 files as input. In fact on my machine (Mac
OSX) it doesn't even accept utf-8 files with the utf-8 signature. And
strings containing utf-8 are interpreted as just 8-bit characters, meaning
every byte is a character.

JO> No confusion about embedded 0x22 bytes in strings.  Also,
JO> stylistically I prefer not to have a document specify its own
JO> encoding, or for comments to affect the meaning of a source
JO> file.

Which 0x22 bytes?

To a certain extend it is possible to autodetect if a file with 8-bit
characters (highest bit set) could be utf-8, but it is error-prone.

-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl