[Tutor] fetching wikipedia articles

amit sethi amit.pureenergy at gmail.com
Fri Jan 23 12:07:53 CET 2009


well thanks ... it worked well ... but robotparser is in urllib isn't there
a module like robotparser in
urllib2

On Fri, Jan 23, 2009 at 3:55 PM, Andre Engels <andreengels at gmail.com> wrote:

> On Fri, Jan 23, 2009 at 10:37 AM, amit sethi <amit.pureenergy at gmail.com>
> wrote:
> > so is there a way around that problem ??
>
> Ok, I have done some checking around, and it seems that the Wikipedia
> server is giving a return code of 403 (forbidden), but still giving
> the page - which I think is weird behaviour. I will check with the
> developers of Wikimedia why this is done, but for now you can resolve
> this by editing robotparser.py in the following way:
>
> In the __init__ of the class URLopener, add the following at the end:
>
> self.addheaders = [header for header in self.addheaders if header[0]
> != "User-Agent"] + [('User-Agent', '<whatever>')]
>
> (probably
>
> self.addheaders = [('User-Agent', '<whatever>')]
>
> does the same, but my version is more secure)
>
> --
> André Engels, andreengels at gmail.com
>



-- 
A-M-I-T S|S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090123/8eda9a43/attachment.htm>


More information about the Tutor mailing list