Python "robots.txt" parser broken since 2003

John Nagle nagle at animats.com
Sat Apr 21 15:22:13 EDT 2007


This bug, "[ 813986 ] robotparser interactively prompts for username and 
password", has been open since 2003.  It killed a big batch job of ours
last night.

Module "robotparser" naively uses "urlopen" to read "robots.txt" URLs.
If the server asks for basic authentication on that file, "robotparser"
prompts for the password on standard input.  Which is rarely what you
want.  You can demonstrate this with:

import robotparser
url = 'http://mueblesmoraleda.com' # this site is password-protected.
parser = robotparser.RobotFileParser()
parser.set_url(url)
parser.read()	# Prompts for password

That's the tandard, although silly, "urllib" behavior.

This was reported in 2003, and a patch was uploaded in 2005, but the patch
never made it into Python 2.4 or 2.5.

A temporary workaround is this:

import robotparser	
def prompt_user_passwd(self, host, realm):
	return None, None
robotparser.URLopener.prompt_user_passwd = prompt_user_passwd	# temp patch


					John Nagle



More information about the Python-list mailing list