newbie: HTTPS screen scraping

John Nagle nagle at animats.com
Sat Apr 21 14:36:06 EDT 2007


user at domain.invalid wrote:
> Hi,
>    Can anyone help me out here. I would like to authenticate myself to a 
> website which uses HTTPS and then after authentication, I would like to 
> get the contents of the webpage. How can this be done using python.
> I have tried urllib and urllib2 but it has not solved my problem.
> 
> TIA
> /varun

    Most of the various URL libraries (urllib, urllib2, and pycurl)
can do this.

    With "urllib", you can subclass FancyURLopener, then redefine
"get_user_passwd(self, host, realm, clear_cache=0)" in the
subclass. That function will be called when a password is needed,
and you return (username, password) as a tuple, which gets sent
to the web server.

    Python, instead of having one library for reading URLs that works,
has at least three, all with different problems.

				John Nagle



More information about the Python-list mailing list