Need to download password-protected page with urllib

Max M maxm at mxm.dk
Fri Oct 4 06:29:04 EDT 2002


I need to download a page for further automated processing.

The problem is that the page is password protected. Which means I go to 
one page and logs in. I then get redirected to another page. Then I can 
go to the page I want to view.

So I guess that I log in and get a cookie returned. This cookie i should 
then return for every request.

I just cannot seem to find the cookie data in the header data I get 
returned from the webserver.

My guess is that it has something to do with how it urllib2 handles 
redirects. It is rather sparsely documented :-/

Anyhoo here is some noneworking code if anybody has a pointer.

########################

import urllib, urllib2

class Browser:

     def __init__(self,, security={
			'USERNAME':'user','PASSWORD':'********'}):
         """
         Goes to the login page and gets the cookie that makes us
         known to the system.
         """
         post_data = { # pilfered from the form
             'RET':'/projects.jsp',
             'FORM_NAME':'login',
             'ok':'',
         }
         post_data.update(security)
         login_page = 'http://www.somesite.dk/login.jsp'
         encoded_post_data = urllib.urlencode(post_data)
         connection = urllib2.urlopen(login_page, encoded_post_data)
         page_info = connection.info() # why no cookie here ?
         ######
         # save cookie here
         self.cookie_dict = {}
         connection.close()


     def browse(self, url):
         "Returns content of page, after we are logged in"
         encoded_post_data = urllib.urlencode(cookie_dict)
         connection = urllib2.urlopen(login_page, encoded_post_data)
         page_source = connection.read()
         connection.close()
         return page_source



if __name__=='__main__':
     browser = Browser()
     browser.browse('http://www.somesite.dk/page_i_really_want.jsp')


regards Max M




More information about the Python-list mailing list