HTTPSConnection script fails, but only on some servers (long)

Paul Winkler stuff at slinkp.com
Tue Apr 12 02:09:11 EDT 2005


This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com;  A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

 1 HTTP/1.1 200 OK
 2 Date: Tue, 12 Apr 2005 05:23:54 GMT
 3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
 4 Content-Length: 37471
 5 Etag:
 6 Content-Type: text/html;charset=iso-8859-1
 7 X-Cache: MISS from XXX.com
 8 Keep-Alive: timeout=15, max=100
 9 Connection: Keep-Alive

... and this when it (successfully) hits url B:

 1 HTTP/1.1 200 OK
 2 Date: Tue, 12 Apr 2005 04:51:30 GMT
 3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
 4 Via: 1.0 XXX.com
 5 Content-Length: 0
 6 Connection: close
 7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all.  So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right?  The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway.  Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+)(:[0-9]+)?')

target_urls = {
    'B': 'https://B/SkinServlet/zopeskin',
    'A': 'https://A/zope/manage_main',
}

auth_info= {'B':    ('userXXX', 'passXXX'),
            'A':    ('userXXX', 'passXXX'),
            }

def doRequest(target, **kw):
    """Provide a trivial interface for doing remote calls.
    Keyword args are passed as query parameters.
    """
    url = target_urls[target]
    user, passwd = auth_info[target]
    proto,host,port=url_re.match(url).groups()
    if port:
        port = int(port[1:])   # remove the ':' ...
    else:
        port = 443
    creds = base64.encodestring("%s:%s" % (user, passwd))
    headers = {"Authorization": "Basic %s" % creds }
    params = urlencode(kw).strip()
    if params:
        url = '%s?%s' % (url, params)
    body = None # only needed for POST
    args =('GET', url, body, headers)
    print "ARGS: %s" % str(args)
    conn = HTTPSConnection(host)
    conn.request(*args)
    response = conn.getresponse()
    data = response.read()
    if response.status >= 300:
        print
        msg = '%i ERROR reported by remote system %s\n' %
(response.status,
                                                           url)
        msg += data
        raise IOError, msg
    print "OK!"
    return data

if __name__ == '__main__':
    print "attempting to connect..."
    result1 = doRequest('A', skey='id', rkey='id')
    result2 = doRequest('B', action='updateSkinId',
                        skinId='406',  facilityId='1466')
    print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler




More information about the Python-list mailing list