[Python-checkins] CVS: python/nondist/peps pep-0263.txt,1.10,1.11

Fri, 15 Mar 2002 09:07:14 -0800

Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv21038

Modified Files:
	pep-0263.txt 
Log Message:
Changed Python's source code encoding default to ASCII.

Added note about handling of Unicode literals in phase 1.

Index: pep-0263.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0263.txt,v
retrieving revision 1.10
retrieving revision 1.11
diff -C2 -d -r1.10 -r1.11
*** pep-0263.txt	7 Mar 2002 11:14:26 -0000	1.10
--- pep-0263.txt	15 Mar 2002 17:07:12 -0000	1.11
***************
*** 41,48 ****
  Defining the Encoding

!     Just as in coercion of strings to Unicode, Python will default to
!     the interpreter's default encoding (which is ASCII in standard
!     Python installations) as standard encoding if no other encoding
!     hints are given.

      To define a source code encoding, a magic comment must
--- 41,46 ----
  Defining the Encoding

!     Python will default to ASCII as standard encoding if no other
!     encoding hints are given.

      To define a source code encoding, a magic comment must
***************
*** 77,86 ****
         source code.

!        Any encoding which allows processing the first two lines in
!        the way indicated above is allowed as source code encoding,
!        this includes ASCII compatible encodings as well as certain
         multi-byte encodings such as Shift_JIS. It does not include
!        encodings which use two or more bytes for all characters
!        like e.g. UTF-16. The reason for this is to keep the encoding
         detection algorithm in the tokenizer simple.

--- 75,84 ----
         source code.

!        Any encoding which allows processing the first two lines in the
!        way indicated above is allowed as source code encoding, this
!        includes ASCII compatible encodings as well as certain
         multi-byte encodings such as Shift_JIS. It does not include
!        encodings which use two or more bytes for all characters like
!        e.g. UTF-16. The reason for this is to keep the encoding
         detection algorithm in the tokenizer simple.

***************
*** 117,133 ****
      require major changes in the internals of the interpreter and
      enforcing the use of magic comments in source code files which
!     place non-default encoding characters in string literals, comments
      and Unicode literals, the proposed solution should be implemented
      in two phases:

!     1. Implement the magic comment detection and default encoding
!        handling, but only apply the detected encoding to Unicode
!        literals in the source file.

         In addition to this step and to aid in the transition to
         explicit encoding declaration, the tokenizer must check the
!        complete source file for compliance with the default encoding
!        (which usually is ASCII). If the source file does not properly
!        decode, a single warning is generated per file.

      2. Change the tokenizer/compiler base string type from char* to
--- 115,134 ----
      require major changes in the internals of the interpreter and
      enforcing the use of magic comments in source code files which
!     place non-ASCII characters in string literals, comments
      and Unicode literals, the proposed solution should be implemented
      in two phases:

!     1. Implement the magic comment detection, but only apply the
!        detected encoding to Unicode literals in the source file.
! 
!        If no magic comment is used, Python should continue to
!        use the standard [raw-]unicode-escape codecs for Unicode
!        literals.

         In addition to this step and to aid in the transition to
         explicit encoding declaration, the tokenizer must check the
!        complete source file for compliance with the declared
!        encoding. If the source file does not properly decode, a single
!        warning is generated per file.

      2. Change the tokenizer/compiler base string type from char* to