[issue32936] RobotFileParser.parse() should raise an exception when the robots.txt file is invalid

Oudin report at bugs.python.org
Sat Feb 24 05:53:13 EST 2018


New submission from Oudin <oudin at crans.org>:

When processing an ill-formed robots.txt file (like https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not instantiate the entries or the default_entry attributes.

In my opinion, the method should raise an exception when no valid User-agent entry (or if there exists an invalid User-agent entry) is found in the robots.txt file.

Otherwise, the only method available is to check the None-liness of default_entry, which is not documented in the documentation (https://docs.python.org/dev/library/urllib.robotparser.html).

According to your opinion on this, I can implement what is necessary and create a PR on Github.

----------
components: Library (Lib)
messages: 312711
nosy: Guinness
priority: normal
severity: normal
status: open
title: RobotFileParser.parse() should raise an exception when the robots.txt file is invalid
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32936>
_______________________________________


More information about the Python-bugs-list mailing list