[Web-SIG] python bug issue2464

Jean-Paul Calderone exarkun at divmod.com
Wed Aug 13 16:17:05 CEST 2008


On Wed, 13 Aug 2008 18:14:19 +0530, "O.R.Senthil Kumaran" <orsenthil at gmail.com> wrote:
>
>I am trying to write a fix for this bug http://bugs.python.org/issue2464
>- urllib2 can't handle http://www.wikispaces.com
>
>What actually happening here is:
>
>1) urllib2 tries to open http://www.wikispaces.com
>2) It gets 302 Redirected to
>https://session.wikispaces.com/session/auth?authToken=1bd8784307f89a495cc1aafb075c4983
>3) It again gets 302 Redirected to:
>'http://www.wikispaces.com?responseToken=1bd8784307f89a495cc1aafb075c4983
>
>After this, gets a 200 code, but when the page it retrived it 400 Bad Request!
>
>Firefox has NO problem in getting the actual page though.
>
>Here is the O/P of the session (I have made print header.items() at
>http_error_302 method in HTTPRedirectHandler):
>
>>>> obj1 = urllib2.urlopen("http://www.wikispaces.com")
>[('content-length', '0'), ('x-whom', 'w9-prod-http, p1'), ('set-cookie',
>'slave=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/, test=1; expires=Wed,
>13-Aug-2008 13:03:51 GMT; path=/'), ('server', 'nginx/0.6.30'), ('connection',
>'close'), ('location',
>'https://session.wikispaces.com/session/auth?authToken=4b3eecb5c1ab301689e446cf03b3a585'),
>('date', 'Wed, 13 Aug 2008 12:33:51 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa
>DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
>charset=utf-8')]
>[('content-length', '0'), ('x-whom', 'w8-prod-https, p1'), ('set-cookie',
>'master=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/,
>master=7de5d46e15fd23b1ddf782c565d4fb3a; expires=Thu, 14-Aug-2008 13:03:53 GMT;
>path=/; domain=session.wikispaces.com'), ('server', 'nginx/0.6.30'),
>('connection', 'close'), ('location',
>'http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585'),
>('date', 'Wed, 13 Aug 2008 12:33:53 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa
>DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
>charset=utf-8')]
>>>> print obj1.geturl()
>http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585
>>>> print obj1.code
>200
>>>> print obj1.headers
>
>>>> print obj1.info()
>
>>>> print obj1.read()
><html>
><head><title>400 Bad Request</title></head>
><body bgcolor="white">
><center><h1>400 Bad Request</h1></center>
><hr><center>nginx/0.6.30</center>
></body>
></html>
>
>With all this happening with urllib2, firefox is able to handle this properly.
>Also I notice that I suffix the url with a dummy path say
>url = "http://www.wikispaces.com/dummy_url_path". The urlopen request will
>still to through 302-302-200. but with dummy_url_path appended in the
>redirections and then read() will succeed!
>
>Please share your opinion on where do you think, that urllib2 is going wrong
>here! I am not able to drill down to the fault point.
>This has NOT got to do with null characters in the redirection url as noted in
>the bug report.
>

Some things:

  http://foo.com

This is not a valid URL.  The correct URL for the intended location here
is:

  http://foo.com/

This is the root of the problem, I suspect.  Firefox notices this problem
and fixes it when deciding what requests to make.  For example, while
urllib2 ultimately asks for this URL:

  ?responseToken=f02a955460b2cc180e9bf1faa8efd383

Firefox recognizes that this is silly and instead asks for:

  /?responseToken=5007a08643c2b4dd719a8848024b2c7a

The tokens are different because these are values from actual requests.
Notice the important difference, though - Firefox's request begins with
a /.

Likely, urllib2 should do a bit more validation of its input and make
sure it is only making requests which follow the protocol.

Jean-Paul


More information about the Web-SIG mailing list