bug in urllib2/python23 ?

Alan Kennedy alanmk at hotmail.com
Fri Jun 20 04:53:52 EDT 2003


Achim Domma wrote:

> the following lines
>
> import urllib2
> urllib2.urlopen('http://fr.allafrica.com/health/newswire')
>
> produce the following backtrace on my computer:
> 
>   File "D:\Python23\lib\urllib2.py", line 136, in urlopen
>     return _opener.open(url, data)
>   File "D:\Python23\lib\urllib2.py", line 330, in open
>     '_open', req)
>   File "D:\Python23\lib\urllib2.py", line 309, in _call_chain
>     result = func(*args)
>   File "D:\Python23\lib\urllib2.py", line 824, in http_open
>     return self.do_open(httplib.HTTP, req)
>   File "D:\Python23\lib\urllib2.py", line 818, in do_open
>     return self.parent.error('http', req, fp, code, msg, hdrs)
>   File "D:\Python23\lib\urllib2.py", line 350, in error
>     result = self._call_chain(*args)
>   File "D:\Python23\lib\urllib2.py", line 309, in _call_chain
>     result = func(*args)
>   File "D:\Python23\lib\urllib2.py", line 447, in http_error_302
>     new = self.redirect_request(req, fp, code, msg, headers)
>   File "D:\Python23\lib\urllib2.py", line 421, in redirect_request
>     if (code in (301, 302, 303, 307) and req.method() in ("GET", "HEAD") or
>   File "D:\Python23\lib\urllib2.py", line 208, in __getattr__
>     raise AttributeError, attr
> AttributeError: method

You have fallen foul of bug 731116

http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=731116

There is a source patch associated with the bug, if you can't wait for a future
release.

http://sourceforge.net/tracker/index.php?func=detail&aid=731153&group_id=5470&atid=305470

> If I add a trailing slash to the url, everything works fine. Looks like a
> bug to me!? Or is there a reason for this behaviour? 

The reason why adding the slash works is as follows.

1. You send the URL "/health/newswire" to "fr.allafrica.com".
2. The server there realises that the URL needs an extra "/" in order to be
valid.
3. It sends you back a 302 - Redirect status code, essentially asking you to
make your request again, with the updated URL, i.e. "/health/newswire/".
4. urllib2 automatically takes care of the redirect for you, i.e. it makes the
second request, unbeknown to you.
5. urllib2 fails, because of bug 731116.

By adding the slash yourself, you are providing a full resolvable URL to the
webserver, so it doesn't send you back a redirect, thus the code that gives rise
to the exception is not executed.

> If yes, how could I
> decide wether I have to add a slash or not?

You can't. The decision of what URLs resolve to is hidden within the origin
server. You have no way of knowing in advance whether or not a request will
result in a redirect or not.

Whoever gave you that URL got it wrong.

If you want to solve the problem without patching your python, I recommend that
you catch the AttributeError exception, and ignore it, e.g.

try
    res = urllib2.urlopen('http://fr.allafrica.com/health/newswire')
except AttributeError:
    pass

HTH,

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list