[Web-SIG] python bug issue2464

O.R.Senthil Kumaran orsenthil at gmail.com
Wed Aug 13 14:44:19 CEST 2008


I am trying to write a fix for this bug http://bugs.python.org/issue2464
- urllib2 can't handle http://www.wikispaces.com

What actually happening here is:

1) urllib2 tries to open http://www.wikispaces.com 
2) It gets 302 Redirected to
https://session.wikispaces.com/session/auth?authToken=1bd8784307f89a495cc1aafb075c4983
3) It again gets 302 Redirected to:
'http://www.wikispaces.com?responseToken=1bd8784307f89a495cc1aafb075c4983

After this, gets a 200 code, but when the page it retrived it 400 Bad Request!

Firefox has NO problem in getting the actual page though.

Here is the O/P of the session (I have made print header.items() at
http_error_302 method in HTTPRedirectHandler):

>>> obj1 = urllib2.urlopen("http://www.wikispaces.com")
[('content-length', '0'), ('x-whom', 'w9-prod-http, p1'), ('set-cookie',
'slave=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/, test=1; expires=Wed,
13-Aug-2008 13:03:51 GMT; path=/'), ('server', 'nginx/0.6.30'), ('connection',
'close'), ('location',
'https://session.wikispaces.com/session/auth?authToken=4b3eecb5c1ab301689e446cf03b3a585'),
('date', 'Wed, 13 Aug 2008 12:33:51 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa
DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
charset=utf-8')]
[('content-length', '0'), ('x-whom', 'w8-prod-https, p1'), ('set-cookie',
'master=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/,
master=7de5d46e15fd23b1ddf782c565d4fb3a; expires=Thu, 14-Aug-2008 13:03:53 GMT;
path=/; domain=session.wikispaces.com'), ('server', 'nginx/0.6.30'),
('connection', 'close'), ('location',
'http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585'),
('date', 'Wed, 13 Aug 2008 12:33:53 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa
DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
charset=utf-8')]
>>> print obj1.geturl()
http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585
>>> print obj1.code
200
>>> print obj1.headers

>>> print obj1.info()

>>> print obj1.read()
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/0.6.30</center>
</body>
</html>

With all this happening with urllib2, firefox is able to handle this properly.
Also I notice that I suffix the url with a dummy path say
url = "http://www.wikispaces.com/dummy_url_path". The urlopen request will
still to through 302-302-200. but with dummy_url_path appended in the
redirections and then read() will succeed!

Please share your opinion on where do you think, that urllib2 is going wrong
here! I am not able to drill down to the fault point.
This has NOT got to do with null characters in the redirection url as noted in
the bug report.

Thanks,
Senthil



More information about the Web-SIG mailing list