[Python-checkins] r81168 - in python/branches/py3k: Doc/c-api/unicode.rst Include/unicodeobject.h

Fri May 14 17:58:55 CEST 2010

Author: victor.stinner
Date: Fri May 14 17:58:55 2010
New Revision: 81168

Log:
Issue #8711: Document PyUnicode_DecodeFSDefault*() functions

 * Add paragraph titles to c-api/unicode.rst.
 * Fix PyUnicode_DecodeFSDefault*() comment: it now uses the "surrogateescape"
   error handler (and not "replace")
 * Remove "The function is intended to be used for paths and file names only
   during bootstrapping process where the codecs are not set up." from
   PyUnicode_FSConverter() comment: it is used after the bootstrapping and for
   other purposes than file names


Modified:
   python/branches/py3k/   (props changed)
   python/branches/py3k/Doc/c-api/unicode.rst
   python/branches/py3k/Include/unicodeobject.h

Modified: python/branches/py3k/Doc/c-api/unicode.rst
==============================================================================

--- python/branches/py3k/Doc/c-api/unicode.rst	(original)
+++ python/branches/py3k/Doc/c-api/unicode.rst	Fri May 14 17:58:55 2010
@@ -10,11 +10,12 @@
 Unicode Objects
 ^^^^^^^^^^^^^^^
 
+Unicode Type
+""""""""""""
+
 These are the basic Unicode object types used for the Unicode implementation in
 Python:
 
-.. % --- Unicode Type -------------------------------------------------------
-
 
 .. ctype:: Py_UNICODE
 
@@ -89,12 +90,13 @@
    Clear the free list. Return the total number of freed items.
 
 
+Unicode Character Properties
+""""""""""""""""""""""""""""
+
 Unicode provides many different character properties. The most often needed ones
 are available through these macros which are mapped to C functions depending on
 the Python configuration.
 
-.. % --- Unicode character properties ---------------------------------------
-
 
 .. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
 
@@ -192,11 +194,13 @@
    Return the character *ch* converted to a double. Return ``-1.0`` if this is not
    possible.  This macro does not raise exceptions.
 
+
+Plain Py_UNICODE
+""""""""""""""""
+
 To create Unicode objects and access their basic sequence properties, use these
 APIs:
 
-.. % --- Plain Py_UNICODE ---------------------------------------------------
-
 
 .. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
 
@@ -364,9 +368,47 @@
 Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
 the system's :ctype:`wchar_t`.
 
-.. % --- wchar_t support for platforms which support it ---------------------
+
+File System Encoding
+""""""""""""""""""""
+
+To encode and decode file names and other environment strings,
+:cdata:`Py_FileSystemEncoding` should be used as the encoding, and
+``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
+encode file names during argument parsing, the ``"O&"`` converter should be
+used, passsing :func:PyUnicode_FSConverter as the conversion function:
+
+.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result)
+
+   Convert *obj* into *result*, using :cdata:`Py_FileSystemDefaultEncoding`,
+   and the ``"surrogateescape"`` error handler. *result* must be a
+   ``PyObject*``, return a :func:`bytes` object which must be released if it
+   is no longer used.
+
+   .. versionadded:: 3.1
+
+.. cfunction:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
+
+   Decode a null-terminated string using :cdata:`Py_FileSystemDefaultEncoding`
+   and the ``"surrogateescape"`` error handler.
+
+   If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
+
+   Use :func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
+
+.. cfunction:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
+
+   Decode a string using :cdata:`Py_FileSystemDefaultEncoding` and
+   the ``"surrogateescape"`` error handler.
+
+   If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
 
 
+wchar_t Support
+"""""""""""""""
+
+wchar_t support for platforms which support it:
+
 .. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
 
    Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given size.
@@ -413,9 +455,11 @@
 The codecs all use a similar interface.  Only deviation from the following
 generic ones are documented for simplicity.
 
-These are the generic codec APIs:
 
-.. % --- Generic Codecs -----------------------------------------------------
+Generic Codecs
+""""""""""""""
+
+These are the generic codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
@@ -444,9 +488,11 @@
    using the Python codec registry. Return *NULL* if an exception was raised by
    the codec.
 
-These are the UTF-8 codec APIs:
 
-.. % --- UTF-8 Codecs -------------------------------------------------------
+UTF-8 Codecs
+""""""""""""
+
+These are the UTF-8 codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
@@ -476,9 +522,11 @@
    object.  Error handling is "strict".  Return *NULL* if an exception was
    raised by the codec.
 
-These are the UTF-32 codec APIs:
 
-.. % --- UTF-32 Codecs ------------------------------------------------------ */
+UTF-32 Codecs
+"""""""""""""
+
+These are the UTF-32 codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
@@ -543,9 +591,10 @@
    Return *NULL* if an exception was raised by the codec.
 
 
-These are the UTF-16 codec APIs:
+UTF-16 Codecs
+"""""""""""""
 
-.. % --- UTF-16 Codecs ------------------------------------------------------ */
+These are the UTF-16 codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
@@ -609,9 +658,11 @@
    order. The string always starts with a BOM mark.  Error handling is "strict".
    Return *NULL* if an exception was raised by the codec.
 
-These are the "Unicode Escape" codec APIs:
 
-.. % --- Unicode-Escape Codecs ----------------------------------------------
+Unicode-Escape Codecs
+"""""""""""""""""""""
+
+These are the "Unicode Escape" codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
@@ -633,9 +684,11 @@
    string object.  Error handling is "strict". Return *NULL* if an exception was
    raised by the codec.
 
-These are the "Raw Unicode Escape" codec APIs:
 
-.. % --- Raw-Unicode-Escape Codecs ------------------------------------------
+Raw-Unicode-Escape Codecs
+"""""""""""""""""""""""""
+
+These are the "Raw Unicode Escape" codec APIs:
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
@@ -657,11 +710,13 @@
    Python string object. Error handling is "strict". Return *NULL* if an exception
    was raised by the codec.
 
+
+Latin-1 Codecs
+""""""""""""""
+
 These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
 ordinals and only these are accepted by the codecs during encoding.
 
-.. % --- Latin-1 Codecs -----------------------------------------------------
-
 
 .. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
 
@@ -682,11 +737,13 @@
    object.  Error handling is "strict".  Return *NULL* if an exception was
    raised by the codec.
 
+
+ASCII Codecs
+""""""""""""
+
 These are the ASCII codec APIs.  Only 7-bit ASCII data is accepted. All other
 codes generate errors.
 
-.. % --- ASCII Codecs -------------------------------------------------------
-
 
 .. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
 
@@ -707,9 +764,11 @@
    object.  Error handling is "strict".  Return *NULL* if an exception was
    raised by the codec.
 
-These are the mapping codec APIs:
 
-.. % --- Character Map Codecs -----------------------------------------------
+Character Map Codecs
+""""""""""""""""""""
+
+These are the mapping codec APIs:
 
 This codec is special in that it can be used to implement many different codecs
 (and this is in fact what was done to obtain most of the standard codecs
@@ -778,7 +837,9 @@
 DBCS) is a class of encodings, not just one.  The target encoding is defined by
 the user settings on the machine running the codec.
 
-.. % --- MBCS codecs for Windows --------------------------------------------
+
+MBCS codecs for Windows
+"""""""""""""""""""""""
 
 
 .. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
@@ -808,20 +869,9 @@
    object.  Error handling is "strict".  Return *NULL* if an exception was
    raised by the codec.
 
-For decoding file names and other environment strings, :cdata:`Py_FileSystemEncoding`
-should be used as the encoding, and ``"surrogateescape"`` should be used as the error
-handler. For encoding file names during argument parsing, the ``O&`` converter should
-be used, passsing PyUnicode_FSConverter as the conversion function:
-
-.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result)
-
-   Convert *obj* into *result*, using the file system encoding, and the ``surrogateescape``
-   error handler. *result* must be a ``PyObject*``, yielding a bytes or bytearray object
-   which must be released if it is no longer used.
-
-   .. versionadded:: 3.1
 
-.. % --- Methods & Slots ----------------------------------------------------
+Methods & Slots
+"""""""""""""""
 
 
 .. _unicodemethodsandslots:

Modified: python/branches/py3k/Include/unicodeobject.h
==============================================================================
--- python/branches/py3k/Include/unicodeobject.h	(original)
+++ python/branches/py3k/Include/unicodeobject.h	Fri May 14 17:58:55 2010
@@ -1240,25 +1240,29 @@
 /* --- File system encoding ---------------------------------------------- */
 
 /* ParseTuple converter which converts a Unicode object into the file
-   system encoding as a bytes object, using the PEP 383 error handler; bytes
-   objects are output as-is. */
+   system encoding as a bytes object, using the "surrogateescape" error
+   handler; bytes objects are output as-is. */
 
 PyAPI_FUNC(int) PyUnicode_FSConverter(PyObject*, void*);
 
-/* Decode a null-terminated string using Py_FileSystemDefaultEncoding.
+/* Decode a null-terminated string using Py_FileSystemDefaultEncoding
+   and the "surrogateescape" error handler.
 
-   If the encoding is supported by one of the built-in codecs (i.e., UTF-8,
-   UTF-16, UTF-32, Latin-1 or MBCS), otherwise fallback to UTF-8 and replace
-   invalid characters with '?'.
+   If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
 
-   The function is intended to be used for paths and file names only
-   during bootstrapping process where the codecs are not set up.
+   Use PyUnicode_DecodeFSDefaultAndSize() if you have the string length.
 */
 
 PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefault(
     const char *s               /* encoded string */
     );
 
+/* Decode a string using Py_FileSystemDefaultEncoding
+   and the "surrogateescape" error handler.
+
+   If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
+*/
+
 PyAPI_FUNC(PyObject*) PyUnicode_DecodeFSDefaultAndSize(
     const char *s,               /* encoded string */
     Py_ssize_t size              /* size */