urllib behaves strangely

Mon Jun 12 06:43:07 EDT 2006

Gabriel Zachmann wrote:

> Here is a very simple Python script utilizing urllib:
>
>      import urllib
>      url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"
>      print url
>      print
>      file = urllib.urlopen( url )
>      mime = file.info()
>      print mime
>      print file.read()
>      print file.geturl()
>
>
> However, when i ecexute it, i get an html error ("access denied").
>
> On the one hand, the funny thing though is that i can view the page fine in my
> browser, and i can download it fine using curl.
>
> On the other hand, it must have something to do with the URL because urllib
> works fine with any other URL i have tried ...
>
> Any ideas?
> I would appreciate very much any hints or suggestions.
>
> Best regards,
> Gabriel.
>
>
> --
> /-----------------------------------------------------------------------\
> | If you know exactly what you will do --                               |
> | why would you want to do it?                                          |
> |                                                       (Picasso)       |
> \-----------------------------------------------------------------------/

I think the problem might be with the Wikimedia Commons website itself,
rather than urllib.  Wikipedia has a policy against unapproved bots:
http://en.wikipedia.org/wiki/Wikipedia:Bots

It might be that Wikimedia Commons blocks bots that aren't approved,
and might consider your program a bot.  I've had similar error message
from www.wikipedia.org and had no problems with a couple of other
websites I've tried.  Also, the html the program returns seems to be a
standard "ACCESS DENIED" page.

I might be worth asking at the Wikimedia Commons website, at least to
eliminate this possibility.

John Hicken