robotparser behavior on 403 (Forbidden) robot.txt files
John Nagle
nagle at animats.com
Mon Jun 2 12:40:52 EDT 2008
I just discovered that the "robotparser" module interprets
a 403 ("Forbidden") status on a "robots.txt" file as meaning
"all access disallowed". That's unexpected behavior.
A major site ("http://www.aplus.net/robot.txt") has their
"robots.txt" file set up that way.
There's no real "robots.txt" standard, unfortunately.
So it's not definitively a bug.
John Nagle
SiteTruth
More information about the Python-list
mailing list