[Python-checkins] r80092 - python/branches/py3k/Doc/library/urllib.request.rst

Mon Apr 19 09:49:38 CEST 2010

Hi,

On 15/04/2010 20.18, senthil.kumaran wrote:
> Author: senthil.kumaran
> Date: Thu Apr 15 19:18:22 2010
> New Revision: 80092
>
> Log:
> Fix Issue5419 - explaining bytes return value of urlopen, use of .decode() to convert to str.
>
>
> Modified:
>     python/branches/py3k/Doc/library/urllib.request.rst
>
> Modified: python/branches/py3k/Doc/library/urllib.request.rst
> ==============================================================================
> --- python/branches/py3k/Doc/library/urllib.request.rst	(original)
> +++ python/branches/py3k/Doc/library/urllib.request.rst	Thu Apr 15 19:18:22 2010
> @@ -1073,23 +1073,39 @@
>   --------
>
>   This example gets the python.org main page and displays the first 100 bytes of
> -it::
> +it.::
>
>      >>>  import urllib.request
>      >>>  f = urllib.request.urlopen('http://www.python.org/')
>      >>>  print(f.read(100))
> +   b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> +<?xml-stylesheet href="./css/ht2html'
> +
python.org doesn't use this doctype anymore (luckily), so this example 
(the and other later) should be updated.
> +Note that in Python 3, urlopen returns a bytes object by default. In many
> +circumstances, you might expect the output of urlopen to be a string. This
> +might be a carried over expectation from Python 2, where urlopen returned
> +string or it might even the common usecase. In those cases, you should
> +explicitly decode the bytes to string.
> +
> +In the examples below, we have chosen *utf-8* encoding for demonstration, you
> +might choose the encoding which is suitable for the webpage you are
> +requesting::
In real-world situations is not possible to just pick an encoding and 
use it to decode the result. The example should show how to read the 
encoding from the HTTP headers and possibly warn that the encoding might 
be missing or incorrect. The encoding can also be specified in other 
places, such as the XML declaration (for XHTML pages only) and in the 
<meta> tag (the headers have higher priority over XML declarations and 
meta tags).
Since the next step after decoding is often parsing, it could also be 
mentioned that libraries to parse HTML are usually already able to 
decode the source automatically, so there's no need to search for the 
encoding and decode manually.
> +
> +>>>  import urllib.request
> +>>>  f = urllib.request.urlopen('http://www.python.org/')
> +>>>  print(f.read(100).decode('utf-8')
A ')' is missing here.
>      <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>      <?xml-stylesheet href="./css/ht2html
>
> -Here we are sending a data-stream to the stdin of a CGI and reading the data it
> -returns to us. Note that this example will only work when the Python
> -installation supports SSL. ::
> +In the following example, we are sending a data-stream to the stdin of a CGI
> +and reading the data it returns to us. Note that this example will only work
> +when the Python installation supports SSL. ::
>
>      >>>  import urllib.request
>      >>>  req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
>      ...                       data='This data is passed to stdin of the CGI')
>      >>>  f = urllib.request.urlopen(req)
> ->>>  print(f.read())
> +>>>  print(f.read().decode('utf-8'))
>      Got Data: "This data is passed to stdin of the CGI"
>
>   The code for the sample CGI used in the above example is::
> @@ -1161,7 +1177,7 @@
>      >>>  import urllib.parse
>      >>>  params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>      >>>  f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
> ->>>  print(f.read())
> +>>>  print(f.read().decode('utf-8'))
>
>   The following example uses the ``POST`` method instead::
>
> @@ -1169,7 +1185,7 @@
>      >>>  import urllib.parse
>      >>>  params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>      >>>  f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
> ->>>  print(f.read())
> +>>>  print(f.read().decode('utf-8'))
>
>   The following example uses an explicitly specified HTTP proxy, overriding
>   environment settings::
> @@ -1178,14 +1194,14 @@
>      >>>  proxies = {'http': 'http://proxy.example.com:8080/'}
>      >>>  opener = urllib.request.FancyURLopener(proxies)
>      >>>  f = opener.open("http://www.python.org")
> ->>>  f.read()
> +>>>  f.read().decode('utf-8')
Why some examples have print() and others don't?
>
>   The following example uses no proxies at all, overriding environment settings::
>
>      >>>  import urllib.request
>      >>>  opener = urllib.request.FancyURLopener({})
>      >>>  f = opener.open("http://www.python.org/")
> ->>>  f.read()
> +>>>  f.read().decode('utf-8')
>
>
>   :mod:`urllib.request` Restrictions
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://mail.python.org/mailman/listinfo/python-checkins
>

Best Regards,
Ezio Melotti