[Python-bugs-list] [ python-Bugs-690214 ] robotparser only applies first applicable rule
SourceForge.net
noreply@sourceforge.net
Thu, 06 Mar 2003 00:27:13 -0800
Bugs item #690214, was opened at 2003-02-20 12:55
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690214&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Mark Pilgrim (f8dy)
Assigned to: Skip Montanaro (montanaro)
Summary: robotparser only applies first applicable rule
Initial Comment:
robotparser robotparser.py::RobotFileParser::can_fetch
currently returns the result of the first applicable rule. It
should loop through all rules looking for anything that
disallows access. For example, if your first rule applies
to 'wget' and 'python' and disallows access to /dir1/, and
your second rule is a 'python' rule that disallows access
to /dir2/, robotparser will falsely claim that python is
allowed to access /dir2/.
Patch against current source attached.
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2003-03-06 02:27
Message:
Logged In: YES
user_id=44345
Closing as it appears robotparser's behavior matches the rfc as Bastian
indicated.
----------------------------------------------------------------------
Comment By: Bastian Kleineidam (calvin)
Date: 2003-03-03 05:46
Message:
Logged In: YES
user_id=9205
Mark, if you dive into
http://www.robotstxt.org/wc/norobots-rfc.txt you'll note
that the first matching user-agent line as well as the first
matching allow or disallow line must be obeyed by the robot
(see 3.2.1 and 3.2.2).
Now, I am not opposed to disobey the above rfc, but there
are other arguments against your patch:
a) it breaks current implementations of robots.txt
(potentially disallowing access to sites)
b) your problem is easily solved by moving Disallow and/or
User-Agent lines to the top
Therefore my count is -1 for this patch.
Cheers, Bastian
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690214&group_id=5470