urllib behaves strangely
John Hicken
john.hicken at gmail.com
Mon Jun 12 06:43:07 EDT 2006
Gabriel Zachmann wrote:
> Here is a very simple Python script utilizing urllib:
>
> import urllib
> url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"
> print url
> print
> file = urllib.urlopen( url )
> mime = file.info()
> print mime
> print file.read()
> print file.geturl()
>
>
> However, when i ecexute it, i get an html error ("access denied").
>
> On the one hand, the funny thing though is that i can view the page fine in my
> browser, and i can download it fine using curl.
>
> On the other hand, it must have something to do with the URL because urllib
> works fine with any other URL i have tried ...
>
> Any ideas?
> I would appreciate very much any hints or suggestions.
>
> Best regards,
> Gabriel.
>
>
> --
> /-----------------------------------------------------------------------\
> | If you know exactly what you will do -- |
> | why would you want to do it? |
> | (Picasso) |
> \-----------------------------------------------------------------------/
I think the problem might be with the Wikimedia Commons website itself,
rather than urllib. Wikipedia has a policy against unapproved bots:
http://en.wikipedia.org/wiki/Wikipedia:Bots
It might be that Wikimedia Commons blocks bots that aren't approved,
and might consider your program a bot. I've had similar error message
from www.wikipedia.org and had no problems with a couple of other
websites I've tried. Also, the html the program returns seems to be a
standard "ACCESS DENIED" page.
I might be worth asking at the Wikimedia Commons website, at least to
eliminate this possibility.
John Hicken
More information about the Python-list
mailing list