[Python-checkins] cpython (merge 3.2 -> default): Explain the use of charset parameter with Content-Type header: issue11082

senthil.kumaran python-checkins at python.org
Fri Mar 16 02:15:45 CET 2012


http://hg.python.org/cpython/rev/90e35b91756d
changeset:   75720:90e35b91756d
parent:      75718:d0cce5a2c0cf
parent:      75719:057cf78ed576
user:        Senthil Kumaran <senthil at uthcode.com>
date:        Thu Mar 15 18:15:34 2012 -0700
summary:
  Explain the use of charset parameter with Content-Type header: issue11082

files:
  Doc/library/urllib.parse.rst   |   7 +-
  Doc/library/urllib.request.rst |  74 +++++++++++++++------
  Lib/urllib/request.py          |   5 +-
  3 files changed, 58 insertions(+), 28 deletions(-)


diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@@ -512,9 +512,10 @@
 
    Convert a mapping object or a sequence of two-element tuples, which may
    either be a :class:`str` or a :class:`bytes`,  to a "percent-encoded"
-   string.  The resultant string must be converted to bytes using the
-   user-specified encoding before it is sent to :func:`urlopen` as the optional
-   *data* argument.
+   string.  If the resultant string is to be used as a *data* for POST
+   operation with :func:`urlopen` function, then it should be properly encoded
+   to bytes, otherwise it would result in a :exc:`TypeError`.
+
    The resulting string is a series of ``key=value`` pairs separated by ``'&'``
    characters, where both *key* and *value* are quoted using :func:`quote_plus`
    above. When a sequence of two-element tuples is used as the *query*
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -2,9 +2,10 @@
 =============================================================
 
 .. module:: urllib.request
-   :synopsis: Next generation URL opening library.
+   :synopsis: Extensible library for opening URLs.
 .. moduleauthor:: Jeremy Hylton <jeremy at alum.mit.edu>
 .. sectionauthor:: Moshe Zadka <moshez at users.sourceforge.net>
+.. sectionauthor:: Senthil Kumaran <senthil at uthcode.com>
 
 
 The :mod:`urllib.request` module defines functions and classes which help in
@@ -20,16 +21,26 @@
    Open the URL *url*, which can be either a string or a
    :class:`Request` object.
 
-   *data* may be a bytes object specifying additional data to send to the
+   *data* must be a bytes object specifying additional data to be sent to the
    server, or ``None`` if no such data is needed. *data* may also be an
    iterable object and in that case Content-Length value must be specified in
    the headers. Currently HTTP requests are the only ones that use *data*; the
    HTTP request will be a POST instead of a GET when the *data* parameter is
-   provided.  *data* should be a buffer in the standard
+   provided.
+
+   *data* should be a buffer in the standard
    :mimetype:`application/x-www-form-urlencoded` format.  The
    :func:`urllib.parse.urlencode` function takes a mapping or sequence of
-   2-tuples and returns a string in this format. urllib.request module uses
-   HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests.
+   2-tuples and returns a string in this format. It should be encoded to bytes
+   before being used as the *data* parameter. The charset parameter in
+   ``Content-Type`` header may be used to specify the encoding. If charset
+   parameter is not sent with the Content-Type header, the server following the
+   HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
+   encoding. It is advisable to use charset parameter with encoding used in
+   ``Content-Type`` header with the :class:`Request`.
+
+   urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
+   in its HTTP requests.
 
    The optional *timeout* parameter specifies a timeout in seconds for
    blocking operations like the connection attempt (if not specified,
@@ -66,9 +77,10 @@
    are handled through the proxy when they are set.
 
    The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
-   discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
-   Proxy handling, which was done by passing a dictionary parameter to
-   ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
+   discontinued; :func:`urllib.request.urlopen` corresponds to the old
+   ``urllib2.urlopen``.  Proxy handling, which was done by passing a dictionary
+   parameter to ``urllib.urlopen``, can be obtained by using
+   :class:`ProxyHandler` objects.
 
    .. versionchanged:: 3.2
       *cafile* and *capath* were added.
@@ -83,10 +95,11 @@
 .. function:: install_opener(opener)
 
    Install an :class:`OpenerDirector` instance as the default global opener.
-   Installing an opener is only necessary if you want urlopen to use that opener;
-   otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
-   The code does not check for a real :class:`OpenerDirector`, and any class with
-   the appropriate interface will work.
+   Installing an opener is only necessary if you want urlopen to use that
+   opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
+   :func:`~urllib.request.urlopen`.  The code does not check for a real
+   :class:`OpenerDirector`, and any class with the appropriate interface will
+   work.
 
 
 .. function:: build_opener([handler, ...])
@@ -138,13 +151,21 @@
 
    *url* should be a string containing a valid URL.
 
-   *data* may be a bytes object specifying additional data to send to the
+   *data* must be a bytes object specifying additional data to send to the
    server, or ``None`` if no such data is needed.  Currently HTTP requests are
    the only ones that use *data*; the HTTP request will be a POST instead of a
    GET when the *data* parameter is provided.  *data* should be a buffer in the
-   standard :mimetype:`application/x-www-form-urlencoded` format.  The
-   :func:`urllib.parse.urlencode` function takes a mapping or sequence of
-   2-tuples and returns a string in this format.
+   standard :mimetype:`application/x-www-form-urlencoded` format.
+
+   The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
+   2-tuples and returns a string in this format. It should be encoded to bytes
+   before being used as the *data* parameter. The charset parameter in
+   ``Content-Type`` header may be used to specify the encoding. If charset
+   parameter is not sent with the Content-Type header, the server following the
+   HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
+   encoding. It is advisable to use charset parameter with encoding used in
+   ``Content-Type`` header with the :class:`Request`.
+
 
    *headers* should be a dictionary, and will be treated as if
    :meth:`add_header` was called with each key and value as arguments.
@@ -156,8 +177,11 @@
    :mod:`urllib`'s default user agent string is
    ``"Python-urllib/2.6"`` (on Python 2.6).
 
-   The following two arguments, *origin_req_host* and *unverifiable*,
-   are only of interest for correct handling of third-party HTTP cookies:
+   An example of using ``Content-Type`` header with *data* argument would be
+   sending a dictionary like ``{"Content-Type":" application/x-www-form-urlencoded;charset=utf-8"}``
+
+   The final two arguments are only of interest for correct handling
+   of third-party HTTP cookies:
 
    *origin_req_host* should be the request-host of the origin
    transaction, as defined by :rfc:`2965`.  It defaults to
@@ -1107,8 +1131,9 @@
    opener.open('http://www.example.com/')
 
 Also, remember that a few standard headers (:mailheader:`Content-Length`,
-:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
-:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
+:mailheader:`Content-Type` without charset parameter and :mailheader:`Host`)
+are added when the :class:`Request` is passed to :func:`urlopen` (or
+:meth:`OpenerDirector.open`).
 
 .. _urllib-examples:
 
@@ -1126,9 +1151,12 @@
 
    >>> import urllib.request
    >>> import urllib.parse
-   >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
-   >>> params = params.encode('utf-8')
-   >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
+   >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
+   >>> data = data.encode('utf-8')
+   >>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
+   >>> # adding charset parameter to the Content-Type header.
+   >>> request.add_header("Content-Type","application/x-www-form-urlencoded;charset=utf-8")
+   >>> f = urllib.request.urlopen(request, data)
    >>> print(f.read().decode('utf-8'))
 
 The following example uses an explicitly specified HTTP proxy, overriding
diff --git a/Lib/urllib/request.py b/Lib/urllib/request.py
--- a/Lib/urllib/request.py
+++ b/Lib/urllib/request.py
@@ -1172,8 +1172,9 @@
         if request.data is not None:  # POST
             data = request.data
             if isinstance(data, str):
-                raise TypeError("POST data should be bytes"
-                        " or an iterable of bytes. It cannot be str.")
+                msg = "POST data should be bytes or an iterable of bytes."\
+                      "It cannot be str"
+                raise TypeError(msg)
             if not request.has_header('Content-type'):
                 request.add_unredirected_header(
                     'Content-type',

-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list