web spider and password protected pages

Peter Hansen peter at engcorp.com
Wed Feb 16 15:48:41 EST 2005


jdonnell wrote:
> I've been writing a simple web spider for fun, and I've run into a
> problem I can't figure out. The spider hangs (waits for username and
> pass) when I hit a page that requires .htaccess authentication.
> 
> self.f = urllib.urlopen('http://blogbloc.com/~jay/test/')
> #nothing below here gets executed
> print self.f.info()
> ...
> 
> It hangs as soon as I call urllib.urlopen(). I was going to try to read
> the info and break for pages that require authentication, but it hangs
> before I can call self.f.info()
> 
> Any ideas?

I tried Google.  First I looked for "python urlopen authentication".
I scanned the top page for the word "authentication" and found a
few references, then something called FancyURLOpener.  Adding that
to my search, skipping down a couple of links, I quickly found
a page that starts "Here is an explanation about how to handle password
protected sites."

Another approach that often works is to throw in the word
"recipe", hoping perhaps to get a hit in the Python Cookbook
page: try "python http authentication recipe", for example.

I hope that teaches you a bit about how to fish, rather than
just giving you one. ;-)

-Peter



More information about the Python-list mailing list