[Tutor] issues with urllib and loading a webpage.

Robert Sjoblom robert.sjoblom at gmail.com
Tue Aug 23 21:27:54 CEST 2011


So, an issue regarding urllib (python 3) came up earlier. I solved it
by using httplib2 instead, but I'm rather curious as to why urllib
wouldn't work.

Here's the code I'm working with:
from http.client import HTTPConnection
HTTPConnection.debuglevel = 1
from urllib.request import urlopen

url = "http://www.boursorama.com/includes/cours/last_transactions.phtml?symbole=1xEURUS"
response = urlopen(url)
print(response.headers.as_string())
print(type(response))

Output is:

send: b'GET /includes/cours/last_transactions.phtml?symbole=1xEURUS
HTTP/1.1\r\nAccept-Encoding: identity\r\nHost:
www.boursorama.com\r\nConnection: close\r\nUser-Agent:
Python-urllib/3.2\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server header: Date header: Content-Type header: Connection
header: Cache-Control header: Pragma header: Expires header:
Set-Cookie header: Set-Cookie header: Vary header: Content-Length
header: Content-Language header: X-sid Server: nginx
Date: Tue, 23 Aug 2011 19:08:00 GMT
Content-Type: text/html; charset=ISO-8859-1
Connection: keep-alive
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Set-Cookie: OBJECT_BOURSORAMA=0; expires=Fri, 20-Aug-2021 19:07:59
GMT; path=/; domain=.www.boursorama.com
Set-Cookie: PHPSESSIONID=d6ceed9aab3dba2e61ded126a925a881; path=/;
domain=.www.boursorama.com
Vary: Accept-Encoding,User-Agent
Content-Length: 7787
Content-Language: fr
X-sid: 30,E

<class 'http.client.HTTPResponse'>

Now, if I were to do a data = response.read() and then print(data), I
should get the source code printed (testing this on python.org works,
btw). However, what I do get is:
b''

What gives?

FYI, the httplib2 solution is easy enough:
import httplib2

h = httplib2.Http('.cache')                     # I prefer to use
folders when working with httplib2
response, content = h.request(url)

Response headers are put in response, the content is accessible as a
byte object in content.
-- 
best regards,
Robert S.


More information about the Tutor mailing list