urllib behaves strangely

Mon Jun 12 06:20:26 EDT 2006

Gabriel Zachmann wrote:

> Here is a very simple Python script utilizing urllib:
> 
>      import urllib
>      url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"
>      print url
>      print
>      file = urllib.urlopen( url )
>      mime = file.info()
>      print mime
>      print file.read()
>      print file.geturl()
> 
> 
> However, when i ecexute it, i get an html error ("access denied").
> 
> On the one hand, the funny thing though is that i can view the page fine
> in my browser, and i can download it fine using curl.
> 
> On the other hand, it must have something to do with the URL because
> urllib works fine with any other URL i have tried ...
> 
> Any ideas?
> I would appreciate very much any hints or suggestions.

The ':' in '..Commons:Feat..' is not a legal character in this part of the
URI and has to be %-quoted as '%3a'.
Try the URI
'http://commons.wikimedia.org/wiki/Commons%3aFeatured_pictures/chronological',
perhaps urllib is stricter than your browsers (which are known to accept
every b******t you feed into them, sometimes with very confusing results)
and gets confused when it tries to parse the malformed URI.

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/