Web authentication urllib2

Steve Holden steve at holdenweb.com
Sat Jan 24 05:36:58 EST 2009


Gabriel wrote:
> Hello,
> 
> I'm new in Python and i would like to write script which need to login
> to a website. I'm experimenting with urllib2,
> especially with something like this:
> 
>     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
>     urllib2.install_opener(opener)
> 
>     params = urllib.urlencode(dict(username='user', password='pass'))
>     f = opener.open('https://web.com', params)
>     data = f.read()
>     f.close()
> 
> And the problem is, that this code logs me in on some sites, but on
> others doesn't, especially on the one I really
> need to login. And i don't know why. So is there some way how to debug
> this code and find out why that script cannot
> login on that specific site?
> 
> Sorry if this question is too lame, but i am really beginner both in
> python and web programming .)
> 
That's actually pretty good code for a newcomer! There are a couple of
issues you may be running into.

First, not all sites use "application-based" authentication - they may
use HTTP authentication of some kind instead. In that case you have to
pass the username and password as a part of the HTTP headers. Michael
Foord has done a fair write-up of the issues at

  http://www.voidspace.org.uk/python/articles/authentication.shtml

and you will do well to read that if, indeed, you need to do basic
authentication.

Second, if it *is* the web application that's doing the authentication
in the sites that are failing (in other words if the credentials are
passed in a web form) then your code may need adjusting to use other
field names, or to include other data as required by the login form. You
can usually find out what's required by reading the HTML source of the
page that contains the login form.

Thirdly [nobody expects the Spanish Inquisition ...], it may be that
some sites are extraordinarily sensitive to programmed login attempts
(possible due to spam), typically using a check of the "Agent:" HTTP
header to "make sure" that the login attempt is coming from a browser
and not a program. For sites like these you may need to emulate a
browser response more fully.

You can use a program like Wireshark to analyze the network traffic,
though you can get add-ons for Firefox that will show you the HTTP
headers on request and response.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/




More information about the Python-list mailing list