HTTPSConnection script fails, but only on some servers (long)
andreas at kostyrka.org
andreas at kostyrka.org
Wed Apr 13 06:18:32 EDT 2005
Well HTTPSConnection does not support proxies. (HTTP/CONNECT + switch to HTTPS)
And it hasn't ever. Although the code seems to make sense there is
no support for handling that switch. Probably a good thing to complain
about (file a new bug report).
In the meantime you should take a look a cURL and pycurl, which do support
all kind of more extreme HTTP (FTP, etc.) handling, like using https over
an proxy.
Andreas
On Tue, Apr 12, 2005 at 03:37:33AM -0400, Steve Holden wrote:
> Paul Winkler wrote:
> >This is driving me up the wall... any help would be MUCH appreciated.
> >I have a module that I've whittled down into a 65-line script in
> >an attempt to isolate the cause of the problem.
> >
> >(Real domain names have been removed in everything below.)
> >
> >SYNOPSIS:
> >
> >I have 2 target servers, at https://A.com and https://B.com.
> >I have 2 clients, wget and my python script.
> >Both clients are sending GET requests with exactly the
> >same urls, parameters, and auth info.
> >
> >wget works fine with both servers.
> >The python script works with server A, but NOT with server B.
> >On Server B, it provoked a "Bad Gateway" error from Apache.
> >In other words, the problem seems to depend on both the client
> >and the server. Joy.
> >
> >Logs on server B show malformed URLs ONLY when the client
> >is my python script, which suggests the script is broken...
> >but logs on server A show no such problem, which suggests
> >the problem is elsewhere.
> >
> >DETAILS
> >
> >Note, the module was originally written for the express
> >purpose of working with B.com; A.com was added as a point of reference
> >to convince myself that the script was not totally insane.
> >Likewise, wget was tried when I wanted to see if it might be
> >a client problem.
> >
> >Note the servers are running different software and return different
> >headers. wget -S shows this when it (successfully) hits url A:
> >
> > 1 HTTP/1.1 200 OK
> > 2 Date: Tue, 12 Apr 2005 05:23:54 GMT
> > 3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
> > 4 Content-Length: 37471
> > 5 Etag:
> > 6 Content-Type: text/html;charset=iso-8859-1
> > 7 X-Cache: MISS from XXX.com
> > 8 Keep-Alive: timeout=15, max=100
> > 9 Connection: Keep-Alive
> >
> >... and this when it (successfully) hits url B:
> >
> > 1 HTTP/1.1 200 OK
> > 2 Date: Tue, 12 Apr 2005 04:51:30 GMT
> > 3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
> > 4 Via: 1.0 XXX.com
> > 5 Content-Length: 0
> > 6 Connection: close
> > 7 Content-Type: text/plain
> >
> >Only things notable to me, apart from the servers are the "Via:" and
> >"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
> >that doesn't seem to be a problem when the client is wget.
> >
> >Sadly I don't grok HTTP well enough to spot anything really
> >suspicious.
> >
> >The apache ssl request log on server B is very interesting.
> >When my script hits it, the request logged is like:
> >
> >A.com - - [01/Apr/2005:17:04:46 -0500] "GET
> >https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
> >HTTP/1.1" 502 351
> >
> >... which apart from the 502, I thought reasonable until I realized
> >there's
> >not supposed to be a protocol or domain in there at all. So this is
> >clearly
> >wrong. When the client is wget, the log shows something more sensible
> >like:
> >
> >A.com - - [01/Apr/2005:17:11:04 -0500] "GET
> >/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
> >HTTP/1.0" 200 -
> >
> >... which looks identical except for not including the spurious
> >protocol and domain, and the response looks as expected (200 with size
> >0).
> >
> >So, that log appears to be strong evidence that the problem is in my
> >client
> >script, right? The failing request is coming in with some bad crap in
> >the path, which Jboss can't handle so it barfs and Apache responds with
> >
> >Bad Gateway. Right?
> >
> >So why does the same exact client code work when hitting server B??
> >No extra gunk in the logs there. AFAICT there is nothing in the script
> >that could lead to such an odd request only on server A.
> >
> >
> >THE SCRIPT
> >
> >#!/usr/bin/python2.3
> >
> >from httplib import HTTPSConnection
> >from urllib import urlencode
> >import re
> >import base64
> >
> >url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+)(:[0-9]+)?')
> >
> >target_urls = {
> > 'B': 'https://B/SkinServlet/zopeskin',
> > 'A': 'https://A/zope/manage_main',
> >}
> >
> >auth_info= {'B': ('userXXX', 'passXXX'),
> > 'A': ('userXXX', 'passXXX'),
> > }
> >
> >def doRequest(target, **kw):
> > """Provide a trivial interface for doing remote calls.
> > Keyword args are passed as query parameters.
> > """
> > url = target_urls[target]
> > user, passwd = auth_info[target]
> > proto,host,port=url_re.match(url).groups()
> > if port:
> > port = int(port[1:]) # remove the ':' ...
> > else:
> > port = 443
> > creds = base64.encodestring("%s:%s" % (user, passwd))
> > headers = {"Authorization": "Basic %s" % creds }
> > params = urlencode(kw).strip()
> > if params:
> > url = '%s?%s' % (url, params)
> > body = None # only needed for POST
> > args =('GET', url, body, headers)
> > print "ARGS: %s" % str(args)
> > conn = HTTPSConnection(host)
> > conn.request(*args)
> > response = conn.getresponse()
> > data = response.read()
> > if response.status >= 300:
> > print
> > msg = '%i ERROR reported by remote system %s\n' %
> >(response.status,
> > url)
> > msg += data
> > raise IOError, msg
> > print "OK!"
> > return data
> >
> >if __name__ == '__main__':
> > print "attempting to connect..."
> > result1 = doRequest('A', skey='id', rkey='id')
> > result2 = doRequest('B', action='updateSkinId',
> > skinId='406', facilityId='1466')
> > print "done!"
> >
> >
> ># EOF
> >
> >
> >So... what the heck is wrong here?
> >
> >at-wits-end-ly y'rs,
> >
> >Paul Winkler
> >
> Paul:
>
> I don't claim to have analyzed exactly what's going on here, but the
> most significant difference between the two is that you are accessing
> site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he "Via:"
> header).
>
> Whether this is a clue or a red herring time alone will tell.
>
> It's possible that wget and your client code aren't using the same proxy
> settings, for example.
>
> regards
> Steve
> --
> Steve Holden +1 703 861 4237 +1 800 494 3119
> Holden Web LLC http://www.holdenweb.com/
> Python Web Programming http://pydish.holdenweb.com/
>
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list