[Python-bugs-list] [ python-Bugs-232000 ] New robotparser fails for non-HTTP schemes
nobody
nobody@sourceforge.net
Mon, 26 Feb 2001 06:38:08 -0800
Artifact #232000, was updated on 2001-02-12 07:18
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=232000&group_id=5470
Category: Python Library
Group: None
Status: Open
Priority: 7
Submitted By: Fred L. Drake, Jr.
Assigned to: Skip Montanaro
Summary: New robotparser fails for non-HTTP schemes
Initial Comment:
The new robotparser module fails for non-HTTP URLs where the old one did not. In particular, file: URLs cause an exception to be raised (socket.error: (111, 'Connection refused')) where the old robotparser did not fail.
This is due, at least in part, by the current code using httplib directly rather than using urllib for flexibility. The code should be changed accordingly.
A good test case for this is running webchecker on a local tree of HTML files. I currently get the exception:
cj42289-a(.../Doc/html); python ../../Tools/webchecker/webchecker.py -x file://`pwd`/api/
webchecker version 1.22
Traceback (most recent call last):
File "../../Tools/webchecker/webchecker.py", line 824, in ?
main()
File "../../Tools/webchecker/webchecker.py", line 205, in main
c.addroot(arg)
File "../../Tools/webchecker/webchecker.py", line 324, in addroot
self.addrobot(root)
File "../../Tools/webchecker/webchecker.py", line 337, in addrobot
rp.read()
File "/usr/local/lib/python2.1/robotparser.py", line 46, in read
connection.putrequest("GET", self.path)
File "/usr/local/lib/python2.1/httplib.py", line 426, in putrequest
self.send(str)
File "/usr/local/lib/python2.1/httplib.py", line 368, in send
self.connect()
File "/usr/local/lib/python2.1/httplib.py", line 352, in connect
self.sock.connect((self.host, self.port))
socket.error: (111, 'Connection refused')
Assigned to Skip since he's the robots.txt guru.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr.
Date: 2001-02-26 06:38
Message:
Logged In: YES
user_id=3066
Skip, is this fixed now?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=232000&group_id=5470