[New-bugs-announce] [issue35457] robotparser reads empty robots.txt file as "all denied"

larsfuse report at bugs.python.org
Tue Dec 11 04:30:47 EST 2018


New submission from larsfuse <lars at cl.no>:

The standard (http://www.robotstxt.org/robotstxt.html) says:

> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)

Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$

Code:

rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Result:

$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)

And the result is, robotparser thinks the site is blocked.

----------
components: Library (Lib)
messages: 331595
nosy: larsfuse
priority: normal
severity: normal
status: open
title: robotparser reads empty robots.txt file as "all denied"
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35457>
_______________________________________


More information about the New-bugs-announce mailing list