HTTPSConnection script fails, but only on some servers (long)

pyguy2 at gmail.com pyguy2 at gmail.com
Wed Apr 13 14:20:10 EDT 2005


I have a couple of recipes at the python cookbook site, that allows
python to do proxy auth and ssl. The easiest one is:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/301740

john

andreas at kostyrka.org wrote:
> Well HTTPSConnection does not support proxies. (HTTP/CONNECT + switch
to HTTPS)
>
> And it hasn't ever. Although the code seems to make sense there is
> no support for handling that switch. Probably a good thing to
complain
> about (file a new bug report).
>
> In the meantime you should take a look a cURL and pycurl, which do
support
> all kind of more extreme HTTP (FTP, etc.) handling, like using https
over
> an proxy.
>
> Andreas
>
> On Tue, Apr 12, 2005 at 03:37:33AM -0400, Steve Holden wrote:
> > Paul Winkler wrote:
> > >This is driving me up the wall... any help would be MUCH
appreciated.
> > >I have a module that I've whittled down into a 65-line script in
> > >an attempt to isolate the cause of the problem.
> > >
> > >(Real domain names have been removed in everything below.)
> > >
> > >SYNOPSIS:
> > >
> > >I have 2 target servers, at https://A.com and https://B.com.
> > >I have 2 clients, wget and my python script.
> > >Both clients are sending GET requests with exactly the
> > >same urls, parameters, and auth info.
> > >
> > >wget works fine with both servers.
> > >The python script works with server A, but NOT with server B.
> > >On Server B, it provoked a "Bad Gateway" error from Apache.
> > >In other words, the problem seems to depend on both the client
> > >and the server. Joy.
> > >
> > >Logs on server B show malformed URLs ONLY when the client
> > >is my python script, which suggests the script is broken...
> > >but logs on server A show no such problem, which suggests
> > >the problem is elsewhere.
> > >
> > >DETAILS
> > >
> > >Note, the module was originally written for the express
> > >purpose of working with B.com;  A.com was added as a point of
reference
> > >to convince myself that the script was not totally insane.
> > >Likewise, wget was tried when I wanted to see if it might be
> > >a client problem.
> > >
> > >Note the servers are running different software and return
different
> > >headers. wget -S shows this when it (successfully) hits url A:
> > >
> > > 1 HTTP/1.1 200 OK
> > > 2 Date: Tue, 12 Apr 2005 05:23:54 GMT
> > > 3 Server: Zope/(unreleased version, python 2.3.3, linux2)
ZServer/1.1
> > > 4 Content-Length: 37471
> > > 5 Etag:
> > > 6 Content-Type: text/html;charset=iso-8859-1
> > > 7 X-Cache: MISS from XXX.com
> > > 8 Keep-Alive: timeout=15, max=100
> > > 9 Connection: Keep-Alive
> > >
> > >... and this when it (successfully) hits url B:
> > >
> > > 1 HTTP/1.1 200 OK
> > > 2 Date: Tue, 12 Apr 2005 04:51:30 GMT
> > > 3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
> > > 4 Via: 1.0 XXX.com
> > > 5 Content-Length: 0
> > > 6 Connection: close
> > > 7 Content-Type: text/plain
> > >
> > >Only things notable to me, apart from the servers are the "Via:"
and
> > >"Connection:" headers. Also the "Content-Length: 0" from B is odd,
but
> > >that doesn't seem to be a problem when the client is wget.
> > >
> > >Sadly I don't grok HTTP well enough to spot anything really
> > >suspicious.
> > >
> > >The apache ssl request log on server B is very interesting.
> > >When my script hits it, the request logged is like:
> > >
> > >A.com - - [01/Apr/2005:17:04:46 -0500] "GET
> >
>https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
> > >HTTP/1.1" 502 351
> > >
> > >... which apart from the 502, I thought reasonable until I
realized
> > >there's
> > >not supposed to be a protocol or domain in there at all.  So this
is
> > >clearly
> > >wrong. When the client is wget, the log shows something more
sensible
> > >like:
> > >
> > >A.com - - [01/Apr/2005:17:11:04 -0500] "GET
> >
>/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
> > >HTTP/1.0" 200 -
> > >
> > >... which looks identical except for not including the spurious
> > >protocol and domain, and the response looks as expected (200 with
size
> > >0).
> > >
> > >So, that log appears to be strong evidence that the problem is in
my
> > >client
> > >script, right?  The failing request is coming in with some bad
crap in
> > >the path, which Jboss can't handle so it barfs and Apache responds
with
> > >
> > >Bad Gateway.  Right?
> > >
> > >So why does the same exact client code work when hitting server
B??
> > >No extra gunk in the logs there. AFAICT there is nothing in the
script
> > >that could lead to such an odd request only on server A.
> > >
> > >
> > >THE SCRIPT
> > >
> > >#!/usr/bin/python2.3
> > >
> > >from httplib import HTTPSConnection
> > >from urllib import urlencode
> > >import re
> > >import base64
> > >
> > >url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+)(:[0-9]+)?')
> > >
> > >target_urls = {
> > >    'B': 'https://B/SkinServlet/zopeskin',
> > >    'A': 'https://A/zope/manage_main',
> > >}
> > >
> > >auth_info= {'B':    ('userXXX', 'passXXX'),
> > >            'A':    ('userXXX', 'passXXX'),
> > >            }
> > >
> > >def doRequest(target, **kw):
> > >    """Provide a trivial interface for doing remote calls.
> > >    Keyword args are passed as query parameters.
> > >    """
> > >    url = target_urls[target]
> > >    user, passwd = auth_info[target]
> > >    proto,host,port=url_re.match(url).groups()
> > >    if port:
> > >        port = int(port[1:])   # remove the ':' ...
> > >    else:
> > >        port = 443
> > >    creds = base64.encodestring("%s:%s" % (user, passwd))
> > >    headers = {"Authorization": "Basic %s" % creds }
> > >    params = urlencode(kw).strip()
> > >    if params:
> > >        url = '%s?%s' % (url, params)
> > >    body = None # only needed for POST
> > >    args =('GET', url, body, headers)
> > >    print "ARGS: %s" % str(args)
> > >    conn = HTTPSConnection(host)
> > >    conn.request(*args)
> > >    response = conn.getresponse()
> > >    data = response.read()
> > >    if response.status >= 300:
> > >        print
> > >        msg = '%i ERROR reported by remote system %s\n' %
> > >(response.status,
> > >                                                           url)
> > >        msg += data
> > >        raise IOError, msg
> > >    print "OK!"
> > >    return data
> > >
> > >if __name__ == '__main__':
> > >    print "attempting to connect..."
> > >    result1 = doRequest('A', skey='id', rkey='id')
> > >    result2 = doRequest('B', action='updateSkinId',
> > >                        skinId='406',  facilityId='1466')
> > >    print "done!"
> > >
> > >
> > ># EOF
> > >
> > >
> > >So... what the heck is wrong here?
> > >
> > >at-wits-end-ly y'rs,
> > >
> > >Paul Winkler
> > >
> > Paul:
> >
> > I don't claim to have analyzed exactly what's going on here, but
the
> > most significant difference between the two is that you are
accessing
> > site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he
"Via:"
> > header).
> >
> > Whether this is a clue or a red herring time alone will tell.
> >
> > It's possible that wget and your client code aren't using the same
proxy
> > settings, for example.
> >
> > regards
> >  Steve
> > --
> > Steve Holden        +1 703 861 4237  +1 800 494 3119
> > Holden Web LLC             http://www.holdenweb.com/
> > Python Web Programming  http://pydish.holdenweb.com/
> > 
> > -- 
> > http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list