How to grab HTML files behind authentification

Dirk Krause d_krause at pixelpark.com
Thu Jun 28 05:26:28 EDT 2001


Hi,

  I've put together some code the python community might find useful.
You can use this script to automatically spider web pages beyond the
www-authenticate Dialog Box.

Regards,
  Dirk

---snip---
import httplib, string, base64

# How to grab HTML files behind authentification
# author: Dirk Krause, 06/28/2001
# change these entries below!!

base = 'http://www.something.com'
path = '/secret/index.html'

u_name = 'myName'
u_pwd  = 'myPassword'


# ok, here goes

hlink = httplib.HTTP(base)
hlink.putrequest('GET', path+' HTTP/1.0')
hlink.putheader('Host', base)

hlink.putheader('Accept', 'text/html')
hlink.putheader('Accept', 'text/plain')

temp = "%s:%s" % (u_name,u_pwd)
temp = base64.encodestring(temp)
temp = "Basic %s" % string.strip(temp)
hlink.putheader("Authorization",temp)

hlink.endheaders()

errcode, errmsg, header = hlink.getreply()
content = hlink.getfile().read()

print content
print errcode, header



More information about the Python-list mailing list