[issue35457] robotparser reads empty robots.txt file as "all denied"
Karthikeyan Singaravelan
report at bugs.python.org
Thu Jan 2 03:36:56 EST 2020
Karthikeyan Singaravelan <tir.karthi at gmail.com> added the comment:
There is a behavior change. parse() sets the modified time and unless the modified time is set the can_fetch method returns false. In Python 2 the parse method was called only when the file is non-empty [0] but in Python 3 it's always called though the file is empty [1] . The change was done with 1afc1696167547a5fa101c53e5a3ab4717f8852c to always read parse and then in 122541beceeccce4ef8a9bf739c727ccdcbf2f28 modified function was always called during parse thus setting the modified_time to return True from can_fetch in the end.
I think the behavior of robotparser for empty file was undefined allowing these changes and it will be good to have a test for this behavior.
[0] https://github.com/python/cpython/blob/f82e59ac4020a64c262a925230a8eb190b652e87/Lib/robotparser.py#L66-L67
[1] https://github.com/python/cpython/blob/149175c6dfc8455023e4335575f3fe3d606729f9/Lib/urllib/robotparser.py#L69-L70
----------
nosy: +berker.peksag, xtreak
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35457>
_______________________________________
More information about the Python-bugs-list
mailing list