urlopen returns forbidden

Grant Edwards invalid at invalid.invalid
Mon Feb 28 10:21:06 EST 2011


On 2011-02-28, Chris Rebert <clp2 at rebertia.com> wrote:
> On Sun, Feb 27, 2011 at 9:38 PM, monkeys paw <monkey at joemoney.net> wrote:
>> I have a working urlopen routine which opens
>> a url, parses it for <a> tags and prints out
>> the links in the page. On some sites, wikipedia for
>> instance, i get a
>>
>> HTTP error 403, forbidden.
>>
>> What is the difference in accessing the site through a web browser
>> and opening/reading the URL with python urllib2.urlopen?
>
> The User-Agent header (http://en.wikipedia.org/wiki/User_agent ).

Sometimes you also need to set the Referrer header for pages that
don't allow direct-linking from "outside".

As somebody else has already said, if the site provides an API that
they want you to use you should do so rather than hammering their web
server with a screen-scraper.

Not only is is a lot less load on the site, it's usually a lot easier.

-- 
Grant Edwards               grant.b.edwards        Yow! Look DEEP into the
                                  at               OPENINGS!!  Do you see any
                              gmail.com            ELVES or EDSELS ... or a
                                                   HIGHBALL?? ...



More information about the Python-list mailing list