[issue35457] robotparser reads empty robots.txt file as "all denied"

Karthikeyan Singaravelan report at bugs.python.org
Thu Jan 2 03:36:56 EST 2020


Karthikeyan Singaravelan <tir.karthi at gmail.com> added the comment:

There is a behavior change. parse() sets the modified time and unless the modified time is set the can_fetch method returns false. In Python 2 the parse method was called only when the file is non-empty [0] but in Python 3 it's always called though the file is empty [1] . The change was done with 1afc1696167547a5fa101c53e5a3ab4717f8852c to always read parse and then in 122541beceeccce4ef8a9bf739c727ccdcbf2f28 modified function was always called during parse thus setting the modified_time to return True from can_fetch in the end.

I think the behavior of robotparser for empty file was undefined allowing these changes and it will be good to have a test for this behavior.

[0] https://github.com/python/cpython/blob/f82e59ac4020a64c262a925230a8eb190b652e87/Lib/robotparser.py#L66-L67
[1] https://github.com/python/cpython/blob/149175c6dfc8455023e4335575f3fe3d606729f9/Lib/urllib/robotparser.py#L69-L70

----------
nosy: +berker.peksag, xtreak

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35457>
_______________________________________


More information about the Python-bugs-list mailing list