[Python-checkins] CVS: python/nondist/peps pep-0263.txt,1.2,1.3
M.-A. Lemburg
lemburg@users.sourceforge.net
Tue, 26 Feb 2002 02:01:28 -0800
Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv6726
Modified Files:
pep-0263.txt
Log Message:
Adapted to Martin's comments.
Index: pep-0263.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0263.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -C2 -d -r1.2 -r1.3
*** pep-0263.txt 14 Aug 2001 17:39:05 -0000 1.2
--- pep-0263.txt 26 Feb 2002 10:01:25 -0000 1.3
***************
*** 38,41 ****
--- 38,62 ----
Python source code data.
+ Defining the Encoding
+
+ Python will default to Latin-1 as standard encoding if no other
+ encoding hints are given.
+
+ To define a source code encoding, a magic comment must
+ be placed into the source files either as first or second
+ line in the file:
+
+ #!/usr/bin/python
+ # -*- coding: <encoding name> -*-
+
+ To aid with platforms such as Windows, which add Unicode BOM marks
+ to the beginning of Unicode files, the UTF-8 signature
+ '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
+ (even if no magic encoding comment is given).
+
+ If a source file uses both the UTF-8 BOM mark signature and a
+ magic encoding comment, the only allowed encoding for the comment
+ is 'utf-8'. Any other encoding will cause an error.
+
Concepts
***************
*** 46,50 ****
Embedding of differently encoded data is not allowed and will
result in a decoding error during compilation of the Python
! source code.
2. Handling of escape sequences should continue to work as it does
--- 67,77 ----
Embedding of differently encoded data is not allowed and will
result in a decoding error during compilation of the Python
! source code.
!
! Only ASCII compatible encodings are allowed as source code
! encoding to assure that Python language elements other than
! literals and comments remain readable by ASCII processing tools
! and to avoid problems with wide characters encodings such as
! UTF-16.
2. Handling of escape sequences should continue to work as it does
***************
*** 72,112 ****
compatibility with the existing implementation
! ISSUE:
!
! Should we restrict identifiers to ASCII ?
!
! To make this backwards compatible, the implementation would have to
! assume Latin-1 as the original file encoding if not given (otherwise,
! binary data currently stored in 8-bit strings wouldn't make the
! roundtrip).
!
! Comment Syntax
!
! The magic comment will use the following syntax. It will have to
! appear as first or second line in the Python source file.
!
! ISSUE:
!
! Possible choices for the format:
!
! 1. Emacs style:
!
! #!/usr/bin/python
! # -*- coding: utf-8; -*-
!
! 2. Via a pseudo-option to the interpreter (one which is not used
! by the interpreter):
! #!/usr/bin/python --encoding=utf-8
! 3. Using a special comment format:
! #!/usr/bin/python
! #!encoding = 'utf-8'
! 4. XML-style format:
! #!/usr/bin/python
! #?python encoding = 'utf-8'
Scope
--- 99,122 ----
compatibility with the existing implementation
! Note that Python identifiers are restricted to the ASCII
! subset of the encoding.
! For backwards compatibility, the implementation must assume
! Latin-1 as the original file encoding if not given (otherwise,
! binary data currently stored in 8-bit strings wouldn't make the
! roundtrip).
! Implementation
! Since changing the Python tokenizer/parser combination will
! require major changes in the internals of the interpreter, the
! proposed solution should be implemented in two phases:
! 1. Implement the magic comment detection and default encoding
! handling, but only apply the detected encoding to Unicode
! literals in the source file.
! 2. Change the tokenizer/compiler base string type from char* to
! Py_UNICODE* and apply the encoding to the complete file.
Scope
***************
*** 115,119 ****
proposed magic comment. Without the magic comment in the proposed
position, Python will treat the source file as it does currently
! to maintain backwards compatibility.
Copyright
--- 125,136 ----
proposed magic comment. Without the magic comment in the proposed
position, Python will treat the source file as it does currently
! (using the Latin-1 encoding assumption) to maintain backwards
! compatibility.
!
! History
!
! 1.3: Worked in comments by Martin v. Loewis:
! UTF-8 BOM mark detection, Emacs style magic comment,
! two phase approach to the implementation
Copyright