[Tutor] Strategy to read a redirecting html page
Alexandre Conrad
alexandre.conrad at gmail.com
Wed Jun 1 01:41:02 CEST 2011
Hi Karim,
When you hit the page and you get an HTTP redirect code back (say,
302), you will need to make another call to the URL specified in the
"Location" parameter in the response headers. Then you retrieve that
new page and you can check you got an acceptable HTTP response code
(such as 200) and read the page's body (or whatever you want to do
with it). Otherwise, keep looping until you get an expected HTTP
response code.
Note: you may get stuck in an infinite loop if two URLs redirect to each other.
You might want to take a look at the higher level httplib module:
http://docs.python.org/library/httplib.html
Although I don't think it can automatically follow redirects for you.
You'll have to implement the loop yourself.
If you can rely on 3rd party packages (not part of the standard Python
library), take a look at httplib2:
https://httplib2.googlecode.com/hg/doc/html/libhttplib2.html
This one can follow redirects.
HTH,
2011/5/31 Karim <karim.liateni at free.fr>:
>
> Hello,
>
> I am having issue in reading a html page which is redirected to a new page.
> I get the first warning/error message page and not the redirection one.
> Should I request a second time the same url page or Should I loop forever
> until the
> page content is the correct (by parsing it) one?
> Do you have a better strategy or perhaps some modules deal w/ that issue?
> I am using python 2.7.1 on Linux ubuntu 11.04 and the modules urllib2,
> urllib, etc...
> The webpage is secured but I registered a password manager.
>
> cheers
> karim
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
--
Alex | twitter.com/alexconrad
More information about the Tutor
mailing list