[Python-checkins] r80092 - python/branches/py3k/Doc/library/urllib.request.rst

Senthil Kumaran orsenthil at gmail.com
Mon Apr 19 10:12:52 CEST 2010


On Sat, Apr 17, 2010 at 12:05:00PM -0400, R. David Murray wrote:
> 
> Senthil, I think that we are in general considering Python 3 a "clean
> start", and avoiding mentioning how things were done in Python 2 except
> where it is important for compatibility (eg: pickle).  I think the
> mention of how Python 2 did it actually muddies the explanation of how
> one should do it.  I would either drop the mention of Python 2, or
> move it to a footnote (I favor just dropping it).
> 
> How about this:
> 
> Note that urlopen returns a bytes object.  This is because there is no way
> for urlopen to automatically determine the encoding of the byte stream
> it receives from the http sever.  In general, a program will decode
> the returned bytes object to string once it determines or guesses
> the appropriate encoding.

Yes, I get your point, David. My write up was more considering the
specific bug where the request was to be explicit and helpful to the
newcomers. Perhaps urllib2 how-to tutorial can provide the specific
details and  this specific note can be written along the lines that you
have mentioned.

> 
> Aside: I was curious how one went about determining the encoding, and
> found this fascinating document that seems to show just now non-trivial
> doing so is:
> 
>   http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
> 
> And I thought email was a pain to parse.  Little did I know.


This is interesting as how other clients are adopting the strategy for
guessing the correct encoding.

-- 
Senthil



More information about the Python-checkins mailing list