[Python-bugs-list] [Bug #132000] New robotparser fails for non-HTTP schemes

noreply@sourceforge.net noreply@sourceforge.net
Mon, 12 Feb 2001 07:18:40 -0800


Bug #132000, was updated on 2001-Feb-12 07:18
Here is a current snapshot of the bug.

Project: Python
Category: Python Library
Status: Open
Resolution: None
Bug Group: None
Priority: 7
Submitted by: fdrake
Assigned to : montanaro
Summary: New robotparser fails for non-HTTP schemes

Details: The new robotparser module fails for non-HTTP URLs where the old
one did not.  In particular, file: URLs cause an exception to be raised
(socket.error: (111, 'Connection refused')) where the old robotparser did
not fail.

This is due, at least in part, by the current code using httplib directly
rather than using urllib for flexibility.  The code should be changed
accordingly.

A good test case for this is running webchecker on a local tree of HTML
files.  I currently get the exception:

cj42289-a(.../Doc/html); python ../../Tools/webchecker/webchecker.py -x
file://`pwd`/api/
webchecker version 1.22
Traceback (most recent call last):
  File "../../Tools/webchecker/webchecker.py", line 824, in ?
    main()
  File "../../Tools/webchecker/webchecker.py", line 205, in main
    c.addroot(arg)
  File "../../Tools/webchecker/webchecker.py", line 324, in addroot
    self.addrobot(root)
  File "../../Tools/webchecker/webchecker.py", line 337, in addrobot
    rp.read()
  File "/usr/local/lib/python2.1/robotparser.py", line 46, in read
    connection.putrequest("GET", self.path)
  File "/usr/local/lib/python2.1/httplib.py", line 426, in putrequest
    self.send(str)
  File "/usr/local/lib/python2.1/httplib.py", line 368, in send
    self.connect()
  File "/usr/local/lib/python2.1/httplib.py", line 352, in connect
    self.sock.connect((self.host, self.port))
socket.error: (111, 'Connection refused')

Assigned to Skip since he's the robots.txt guru.

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132000&group_id=5470