web spider and password protected pages

jdonnell jaydonnell at gmail.com
Wed Feb 16 14:03:21 EST 2005


I've been writing a simple web spider for fun, and I've run into a
problem I can't figure out. The spider hangs (waits for username and
pass) when I hit a page that requires .htaccess authentication.

self.f = urllib.urlopen('http://blogbloc.com/~jay/test/')
#nothing below here gets executed
print self.f.info()
...

It hangs as soon as I call urllib.urlopen(). I was going to try to read
the info and break for pages that require authentication, but it hangs
before I can call self.f.info()

Any ideas?




More information about the Python-list mailing list