Headers issue with urllib2 and ClientCookie (was : ClientCookie .read() failing on some servers )

Patrick.Bussi at space.alcatel.fr Patrick.Bussi at space.alcatel.fr
Mon Apr 7 05:03:32 EDT 2003


Hello,

Really sorry for a previous incomplete mail sent by error to the list.

I am concerned about a piece of soft which has different behaviour depending on
the web site whether I send it.

Here is :

[1] short test program, which simply creates an instance of urllib2 object, adds
headers to it, sends the request to the server and get the response. (uncomment
the one or the other line to use ClientCookie or urllib2)

[2] extract of the dump of results for www.python.org (correct)

[3] dump of results for urllib2 opening www.lycos.fr, showing an
urllib2.HTTPError: HTTP Error 302

[4] dump of results for ClientCookie opening www.lycos.fr

[5] tcpdump trace on the PPP link, showing that the headers seem not taken into
account, which could explain the failure.

Questions :

A/ does the wrong headers justify the failure in server response ?
B/ what is wrong in my test program headers ?


[1]----------snip----------
#! /usr/bin/env python
'''usage:
         $ python test.py www.python.org
'''

def test(h):
    '''h is the host name. Caution : no protection against mistakes
    '''
    import ClientCookie, urllib2, urllib
    from urllib2 import Request
    req = urllib2.Request('http://'+h)
    req.add_header('Host',h)
    req.add_header('User-agent',
'Mozilla/5.5.(X11;.U;.Linux.2.4;.en-US;.0.8).Gecko/20010409')
    req.add_header('Accept','*/*')
    req.add_header('Accept-Language','en')
    req.add_header('Accept-Encoding','gzip,deflate,compress,identity')
    req.add_header('Keep-Alive','300')
    req.add_header('Connection','keep-alive')
    print '\n'.join(['%s' %k for k in (req.get_full_url(), req.headers)])

    response=ClientCookie.urlopen(req)
#    response=urllib2.urlopen(req)
    print '\n', response.info()
    if response: print response.read()

if __name__ == '__main__':
    import os, sys
    print 'Linux', ''.join(['%s'*3 %os.uname()[2:]])
    print 'Python', sys.version
    try: h=sys.argv[1]
    except: sys.exit(__doc__)
    test(h)
----------snip----------

[2]----------snip----------
Linux 2.4.19-pb11#3 SMP Sun Mar 30 06:32:53 CEST 2003i686
Python 2.2.2 (#3, Apr  7 2003, 05:07:52)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]
http://www.python.org
{'Host': 'www.python.org', 'Accept-Language': 'en', 'Accept-Encoding':
'gzip,deflate,compress,identity', 'Connection': 'keep-alive', 'Keep-Alive':
'300', 'Accept': '*/*', 'User-agent':
'Mozilla/5.5.(X11;.U;.Linux.2.4;.en-US;.0.8).Gecko/20010409'}
Date: Sun, 06 Apr 2003 16:00:56 GMT
Server: Apache/1.3.26 (Unix)
Last-Modified: Fri, 04 Apr 2003 15:59:44 GMT
ETag: "5a7585-3876-3e8dabf0"
Accept-Ranges: bytes
Content-Length: 14454
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><?xml-stylesheet
href="./css/ht2html.css" type="text/css"?>
[...cut...]
----------snip----------


[3]----------snip----------
Linux 2.4.19-pb11#3 SMP Sun Mar 30 06:32:53 CEST 2003i686
Python 2.2.2 (#3, Apr  7 2003, 05:07:52)
[GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]
http://www.lycos.fr
{'Host': 'www.lycos.fr', 'Accept-Language': 'en', 'Accept-Encoding':
'gzip,deflate,compress,identity', 'Connection': 'keep-alive', 'Keep-Alive':
'300', 'Accept': '*/*', 'User-agent':
'Mozilla/5.5.(X11;.U;.Linux.2.4;.en-US;.0.8).Gecko/20010409'}
Traceback (most recent call last):
  File "test111-03.py", line 33, in ?
    test(h)
  File "test111-03.py", line 22, in test
    response=urllib2.urlopen(req)
  File "/usr/lib/python2.2/urllib2.py", line 138, in urlopen
    return _opener.open(url, data)
[...cut...]
  File "/usr/lib/python2.2/urllib2.py", line 425, in http_error_302
    self.inf_msg + msg, headers, fp)
urllib2.HTTPError: HTTP Error 302: The HTTP server returned a redirect error
that wouldlead to an infinite loop.
The last 302 error message was:
Found
----------snip----------


[4]----------snip----------
[...cut...]
   File "/usr/lib/python2.2/site-packages/ClientCookie/_ClientCookie.py", line
2098, in http_error_302    raise HTTPError(req.get_full_url(), code,
----------snip----------

[5]----------snip----------
0x0000 4500 014b bc2f 4000 4006 528e 3e93 a00c     E..K./@. at .R.>...
0x0010 c26d 89e2 813c 0050 bceb f313 2dbb 0d7b     .m...<.P....-..{
0x0020 5018 16d0 03cf 0000 4745 5420 2f20 4854     P.......GET./.HT
0x0030 5450 2f31 2e30 0d0a 486f 7374 3a20 7777     TP/1.0..Host:.ww
0x0040 772e 7079 7468 6f6e 2e6f 7267 0d0a 5573     w.python.org..Us
0x0050 6572 2d61 6765 6e74 3a20 5079 7468 6f6e     er-agent:.Python    <--------
???
----------snip----------


Thanks for any help


---
Patrick Bussi
patrick.bussi at space.alcatel.fr


Any opinions expressed are my own and not necessarily those of my Company.







More information about the Python-list mailing list