Problem with Python's "robots.txt" file parser in module robotparser

John Nagle nagle at animats.com
Wed Jul 11 18:23:59 EDT 2007


Nikita the Spider wrote:

> 
> Hi John,
> Are you sure you're not confusing your sites? The robots.txt file at 
> www.ibm.com contains the double slashed path. The robots.txt file at 
> ibm.com  is different and contains this which would explain why you 
> think all URLs are denied:
> User-agent: *
> Disallow: /
>
    Ah, that's it.  The problem is that "ibm.com" redirects to
"http://www.ibm.com", but but "ibm.com/robots.txt" does not
redirect.  For comparison, try "microsoft.com/robots.txt",
which does redirect.

   				John Nagle



More information about the Python-list mailing list