[issue35457] robotparser reads empty robots.txt file as "all denied"

Andre Burgaud report at bugs.python.org
Wed Jan 1 22:41:13 EST 2020


Andre Burgaud <andre.burgaud at gmail.com> added the comment:

Hi,

Is this ticket still relevant for Python 3.8?

While running some tests with an empty robotstxt file I realized that it was returning "ALLOWED" for any path (as per the current draft of the Robots Exclusion Protocol: https://tools.ietf.org/html/draft-koster-rep-00#section-2.2.1 ")

Code:

from urllib import robotparser

robots_url = "file:///tmp/empty.txt"

rp = robotparser.RobotFileParser()
print(robots_url)
rp.set_url(robots_url)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Output:

$ cat /tmp/empty.txt
$ python -V
Python 3.8.1
$ python test_robot3.py
file:///tmp/empty.txt
fetch / True
fetch /admin True

----------
nosy: +gallicrooster

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35457>
_______________________________________


More information about the Python-bugs-list mailing list