[New-bugs-announce] [issue17403] Robotparser fails to parse some robots.txt

Tue Mar 12 11:58:24 CET 2013

New submission from Ben Mezger:

I am trying to parse Google's robots.txt (http://google.com/robots.txt) and it fails when checking whether I can crawl the url /catalogs/p? (which it's allowed) but it's returning false, according to my question on stackoverflow -> http://stackoverflow.com/questions/15344253/robotparser-doesnt-seem-to-parse-correctly

Someone has answered it has to do with the line "rllib.quote(urlparse.urlparse(urllib.unquote(url))[2])" in robotparser's module, since it removes the "?" from the end of the url. 

Here is the answer I received -> http://stackoverflow.com/a/15350039/1649067

----------
components: Library (Lib)
messages: 184017
nosy: benmezger
priority: normal
severity: normal
status: open
title: Robotparser fails to parse some robots.txt
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17403>
_______________________________________