[issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.

Senthil Kumaran report at bugs.python.org
Tue Sep 11 06:44:57 CEST 2012


Senthil Kumaran added the comment:

Hello Eduardo,

I fail to see the bug in here. Robotparser module is for reading and
parsing the robot.txt file, the module responsible for fetching it
could urllib. robots.txt is always available from web-server and you
can download the robot.txt by any means, even by using
robotparser.read by providing the full url to robots.txt. You do not
need to set user-agent to read/fetch the robots.txt file. Once
fetched, now when you are crawling the site using your custom written
crawler or using urllib, you can honor the User-Agent requirement by
sending proper headers with your request. That can be done using
urllib module itself and there is documentation on adding headers I
believe.

I think, this is way most folks would be (or I believe are ) using it.
Am I missing something? If my above explanation is okay, then we can
close this bug as invalid.

Thanks,
Senthil

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15851>
_______________________________________


More information about the Python-bugs-list mailing list