[Tutor] Strategy to read a redirecting html page

Alexandre Conrad alexandre.conrad at gmail.com
Wed Jun 1 01:41:02 CEST 2011


Hi Karim,

When you hit the page and you get an HTTP redirect code back (say,
302), you will need to make another call to the URL specified in the
"Location" parameter in the response headers. Then you retrieve that
new page and you can check you got an acceptable HTTP response code
(such as 200) and read the page's body (or whatever you want to do
with it). Otherwise, keep looping until you get an expected HTTP
response code.

Note: you may get stuck in an infinite loop if two URLs redirect to each other.

You might want to take a look at the higher level httplib module:
http://docs.python.org/library/httplib.html

Although I don't think it can automatically follow redirects for you.
You'll have to implement the loop yourself.

If you can rely on 3rd party packages (not part of the standard Python
library), take a look at httplib2:
https://httplib2.googlecode.com/hg/doc/html/libhttplib2.html

This one can follow redirects.

HTH,

2011/5/31 Karim <karim.liateni at free.fr>:
>
> Hello,
>
> I am having issue in reading a html page which is redirected to a new page.
> I get the first warning/error message page and not the redirection one.
> Should I request a second time the same url page or Should I loop forever
> until the
> page content is the correct (by parsing it) one?
> Do you have a better strategy or perhaps some modules deal w/ that issue?
> I am using python 2.7.1 on Linux ubuntu 11.04 and the modules urllib2,
> urllib, etc...
> The webpage is secured but I registered a password manager.
>
> cheers
> karim
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
Alex | twitter.com/alexconrad


More information about the Tutor mailing list