[Python-bugs-list] [Bug #132000] New robotparser fails for non-HTTP schemes
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 12 Feb 2001 07:18:40 -0800
Bug #132000, was updated on 2001-Feb-12 07:18
Here is a current snapshot of the bug.
Project: Python
Category: Python Library
Status: Open
Resolution: None
Bug Group: None
Priority: 7
Submitted by: fdrake
Assigned to : montanaro
Summary: New robotparser fails for non-HTTP schemes
Details: The new robotparser module fails for non-HTTP URLs where the old
one did not. In particular, file: URLs cause an exception to be raised
(socket.error: (111, 'Connection refused')) where the old robotparser did
not fail.
This is due, at least in part, by the current code using httplib directly
rather than using urllib for flexibility. The code should be changed
accordingly.
A good test case for this is running webchecker on a local tree of HTML
files. I currently get the exception:
cj42289-a(.../Doc/html); python ../../Tools/webchecker/webchecker.py -x
file://`pwd`/api/
webchecker version 1.22
Traceback (most recent call last):
File "../../Tools/webchecker/webchecker.py", line 824, in ?
main()
File "../../Tools/webchecker/webchecker.py", line 205, in main
c.addroot(arg)
File "../../Tools/webchecker/webchecker.py", line 324, in addroot
self.addrobot(root)
File "../../Tools/webchecker/webchecker.py", line 337, in addrobot
rp.read()
File "/usr/local/lib/python2.1/robotparser.py", line 46, in read
connection.putrequest("GET", self.path)
File "/usr/local/lib/python2.1/httplib.py", line 426, in putrequest
self.send(str)
File "/usr/local/lib/python2.1/httplib.py", line 368, in send
self.connect()
File "/usr/local/lib/python2.1/httplib.py", line 352, in connect
self.sock.connect((self.host, self.port))
socket.error: (111, 'Connection refused')
Assigned to Skip since he's the robots.txt guru.
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132000&group_id=5470