Need to download password-protected page with urllib
Max M
maxm at mxm.dk
Fri Oct 4 06:29:04 EDT 2002
I need to download a page for further automated processing.
The problem is that the page is password protected. Which means I go to
one page and logs in. I then get redirected to another page. Then I can
go to the page I want to view.
So I guess that I log in and get a cookie returned. This cookie i should
then return for every request.
I just cannot seem to find the cookie data in the header data I get
returned from the webserver.
My guess is that it has something to do with how it urllib2 handles
redirects. It is rather sparsely documented :-/
Anyhoo here is some noneworking code if anybody has a pointer.
########################
import urllib, urllib2
class Browser:
def __init__(self,, security={
'USERNAME':'user','PASSWORD':'********'}):
"""
Goes to the login page and gets the cookie that makes us
known to the system.
"""
post_data = { # pilfered from the form
'RET':'/projects.jsp',
'FORM_NAME':'login',
'ok':'',
}
post_data.update(security)
login_page = 'http://www.somesite.dk/login.jsp'
encoded_post_data = urllib.urlencode(post_data)
connection = urllib2.urlopen(login_page, encoded_post_data)
page_info = connection.info() # why no cookie here ?
######
# save cookie here
self.cookie_dict = {}
connection.close()
def browse(self, url):
"Returns content of page, after we are logged in"
encoded_post_data = urllib.urlencode(cookie_dict)
connection = urllib2.urlopen(login_page, encoded_post_data)
page_source = connection.read()
connection.close()
return page_source
if __name__=='__main__':
browser = Browser()
browser.browse('http://www.somesite.dk/page_i_really_want.jsp')
regards Max M
More information about the Python-list
mailing list