[Python-checkins] CVS: python/nondist/peps pep-0263.txt,1.2,1.3

M.-A. Lemburg lemburg@users.sourceforge.net
Tue, 26 Feb 2002 02:01:28 -0800


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv6726

Modified Files:
	pep-0263.txt 
Log Message:
Adapted to Martin's comments.



Index: pep-0263.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0263.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** pep-0263.txt	14 Aug 2001 17:39:05 -0000	1.2
--- pep-0263.txt	26 Feb 2002 10:01:25 -0000	1.3
***************
*** 38,41 ****
--- 38,62 ----
      Python source code data.
  
+ Defining the Encoding
+ 
+     Python will default to Latin-1 as standard encoding if no other
+     encoding hints are given.
+ 
+     To define a source code encoding, a magic comment must
+     be placed into the source files either as first or second
+     line in the file:    
+ 
+           #!/usr/bin/python
+           # -*- coding: <encoding name> -*-
+ 
+     To aid with platforms such as Windows, which add Unicode BOM marks
+     to the beginning of Unicode files, the UTF-8 signature
+     '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
+     (even if no magic encoding comment is given).
+ 
+     If a source file uses both the UTF-8 BOM mark signature and a
+     magic encoding comment, the only allowed encoding for the comment
+     is 'utf-8'.  Any other encoding will cause an error.
+ 
  Concepts
  
***************
*** 46,50 ****
         Embedding of differently encoded data is not allowed and will
         result in a decoding error during compilation of the Python
!        source code.
  
      2. Handling of escape sequences should continue to work as it does 
--- 67,77 ----
         Embedding of differently encoded data is not allowed and will
         result in a decoding error during compilation of the Python
!        source code. 
! 
!        Only ASCII compatible encodings are allowed as source code
!        encoding to assure that Python language elements other than
!        literals and comments remain readable by ASCII processing tools
!        and to avoid problems with wide characters encodings such as
!        UTF-16.
  
      2. Handling of escape sequences should continue to work as it does 
***************
*** 72,112 ****
            compatibility with the existing implementation
  
!           ISSUE: 
! 
!               Should we restrict identifiers to ASCII ?
! 
!        To make this backwards compatible, the implementation would have to
!        assume Latin-1 as the original file encoding if not given (otherwise,
!        binary data currently stored in 8-bit strings wouldn't make the
!        roundtrip).
! 
! Comment Syntax
! 
!     The magic comment will use the following syntax. It will have to
!     appear as first or second line in the Python source file.
! 
!     ISSUE:
! 
!         Possible choices for the format:
! 
!         1. Emacs style:
! 
!           #!/usr/bin/python
!           # -*- coding: utf-8; -*-
! 
!         2. Via a pseudo-option to the interpreter (one which is not used
!            by the interpreter):
  
!           #!/usr/bin/python --encoding=utf-8
  
!         3. Using a special comment format:
  
!           #!/usr/bin/python
!           #!encoding = 'utf-8'
  
!         4. XML-style format:
  
!           #!/usr/bin/python
!           #?python encoding = 'utf-8'
  
  Scope
--- 99,122 ----
            compatibility with the existing implementation
  
!        Note that Python identifiers are restricted to the ASCII
!        subset of the encoding.
  
!     For backwards compatibility, the implementation must assume
!     Latin-1 as the original file encoding if not given (otherwise,
!     binary data currently stored in 8-bit strings wouldn't make the
!     roundtrip).
  
! Implementation
  
!     Since changing the Python tokenizer/parser combination will
!     require major changes in the internals of the interpreter, the
!     proposed solution should be implemented in two phases:
  
!     1. Implement the magic comment detection and default encoding
!        handling, but only apply the detected encoding to Unicode
!        literals in the source file.
  
!     2. Change the tokenizer/compiler base string type from char* to
!        Py_UNICODE* and apply the encoding to the complete file.
  
  Scope
***************
*** 115,119 ****
      proposed magic comment. Without the magic comment in the proposed
      position, Python will treat the source file as it does currently
!     to maintain backwards compatibility.
  
  Copyright
--- 125,136 ----
      proposed magic comment. Without the magic comment in the proposed
      position, Python will treat the source file as it does currently
!     (using the Latin-1 encoding assumption) to maintain backwards
!     compatibility.
! 
! History
! 
!     1.3: Worked in comments by Martin v. Loewis: 
!          UTF-8 BOM mark detection, Emacs style magic comment,
!          two phase approach to the implementation
  
  Copyright