[Python-checkins] r82120 - python/trunk/Doc/howto/unicode.rst

andrew.kuchling python-checkins at python.org
Sun Jun 20 23:45:45 CEST 2010


Author: andrew.kuchling
Date: Sun Jun 20 23:45:45 2010
New Revision: 82120

Log:
Note that Python 3.x isn't covered; add forward ref. for UTF-8; note error in 2.5 and up

Modified:
   python/trunk/Doc/howto/unicode.rst

Modified: python/trunk/Doc/howto/unicode.rst
==============================================================================
--- python/trunk/Doc/howto/unicode.rst	(original)
+++ python/trunk/Doc/howto/unicode.rst	Sun Jun 20 23:45:45 2010
@@ -2,10 +2,12 @@
   Unicode HOWTO
 *****************
 
-:Release: 1.02
+:Release: 1.03
 
-This HOWTO discusses Python's support for Unicode, and explains various problems
-that people commonly encounter when trying to work with Unicode.
+This HOWTO discusses Python 2.x's support for Unicode, and explains
+various problems that people commonly encounter when trying to work
+with Unicode.  (This HOWTO has not yet been updated to cover the 3.x
+versions of Python.)
 
 Introduction to Unicode
 =======================
@@ -144,8 +146,9 @@
 4. Many Internet standards are defined in terms of textual data, and can't
    handle content with embedded zero bytes.
 
-Generally people don't use this encoding, instead choosing other encodings that
-are more efficient and convenient.
+Generally people don't use this encoding, instead choosing other
+encodings that are more efficient and convenient.  UTF-8 is probably
+the most commonly supported encoding; it will be discussed below.
 
 Encodings don't have to handle every possible Unicode character, and most
 encodings don't.  For example, Python's default encoding is the 'ascii'
@@ -222,8 +225,8 @@
 <http://en.wikipedia.org/wiki/UTF-8>, for example.
 
 
-Python's Unicode Support
-========================
+Python 2.x's Unicode Support
+============================
 
 Now that you've learned the rudiments of Unicode, we can look at Python's
 Unicode features.
@@ -272,7 +275,7 @@
     >>> unicode('\x80abc', errors='ignore')
     u'abc'
 
-Encodings are specified as strings containing the encoding's name.  Python 2.4
+Encodings are specified as strings containing the encoding's name.  Python 2.7
 comes with roughly 100 different encodings; see the Python Library Reference at
 :ref:`standard-encodings` for a list.  Some encodings
 have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
@@ -427,11 +430,19 @@
 
 When you run it with Python 2.4, it will output the following warning::
 
-    amk:~$ python p263.py
+    amk:~$ python2.4 p263.py
     sys:1: DeprecationWarning: Non-ASCII character '\xe9'
          in file p263.py on line 2, but no encoding declared;
          see http://www.python.org/peps/pep-0263.html for details
 
+Python 2.5 and higher are stricter and will produce a syntax error::
+
+    amk:~$ python2.5 p263.py
+    File "/tmp/p263.py", line 2
+    SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py
+      on line 2, but no encoding declared; see
+      http://www.python.org/peps/pep-0263.html for details
+
 
 Unicode Properties
 ------------------
@@ -693,7 +704,11 @@
 
 Version 1.02: posted August 16 2005.  Corrects factual errors.
 
+Version 1.03: posted June 20 2010.  Notes that Python 3.x is not covered,
+and that the HOWTO only covers 2.x.
+
 
+.. comment Describe Python 3.x support (new section? new document?)
 .. comment Additional topic: building Python w/ UCS2 or UCS4 support
 .. comment Describe obscure -U switch somewhere?
 .. comment Describe use of codecs.StreamRecoder and StreamReaderWriter


More information about the Python-checkins mailing list