Newbie question: Why does read() method of urllib hang?

Alan Runyan runyaga at noeggsorspam.runyaga.com
Mon Feb 11 17:59:12 EST 2002


>     mystring=u.read()
>
> works fine, but sometimes it just hangs. Is this simply because the remote
> server is not responding? I would have thought that would cause
> urllib.urlopen() to hang, not u.read().

Andrew, what version of python are you running?  A friend of mine who I am
trying to convert to Python ran into this exact problem.
He was trying to do a HTTP POST to a web page, which was assigning him
cookies and redirecting him (the Real World ;).  urllib
doesnt handle this very well at all ;'(.  he reported to me urlopen() was
hanging so I gave it a go.  I'm using Python 2.1.2 and I could not
reproduce this.

So.. what I attempted was to re-write what he assumed urlopen() would do for
him.  and now I am stuck.  I'm not quite sure how
cookies and redirect work together.  I know urllib2 kinda gives you some
more options, but this is *very* unintuitive I believe.  we really need
examples.  here is my code if someone could take a look at it and see what I
am trying to do I would greatly appreciate it.

-- snip! --
import urllib2, urllib, urlparse
from urllib2 import Request
import httplib

DEBUG = 1

class CookieHTTPRedirectHandler(urllib2.HTTPRedirectHandler,
urllib2.HTTPHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        if DEBUG:
            print 'was going to ' + req._Request__original +
str(req.headers)

        import pdb; pdb.set_trace()

        if headers.has_key('location'):
            newurl = headers['location']
        elif headers.has_key('uri'):
            newurl = headers['uri']
        else:
            print 'returning'
            return
        newurl = urlparse.urljoin(req.get_full_url(), newurl)

        # XXX Probably want to forget about the state of the current
        # request, although that might interact poorly with other
        # handlers that also use handler-specific request attributes

        response_headers={}
        for head in headers.headers:
            cookie='Set-Cookie:'
            if head[:len(cookie)]==cookie:
                response_headers[cookie]=head[len(cookie)+1:]
        print 'redirect headers ' + str(response_headers)
        new = Request(newurl, req.get_data(), response_headers)

        if DEBUG:
            print 'redirected to ' + new._Request__original

        new.error_302_dict = {}
        if hasattr(req, 'error_302_dict'):
            if len(req.error_302_dict)>10 or \
               req.error_302_dict.has_key(newurl):
                raise HTTPError(req.get_full_url(), code,
                                self.inf_msg + msg, headers, fp)
            new.error_302_dict.update(req.error_302_dict)
        new.error_302_dict[newurl] = newurl

        # Don't close the fp until we are sure that we won't use it
        # with HTTPError.
        fp.read()
        fp.close()
        print 'returning : ' + str(new.headers)
        return self.parent.open(new)

    def http_open(self, req):
        return self.do_open(httplib.HTTP, req)

class HTTPConnection:
    def __init__(self, url, request_data, headers):
        self._request=urllib2.Request(url, urllib.urlencode(request_data),
{})
        self._director=urllib2.OpenerDirector()
        self._director.add_handler(CookieHTTPRedirectHandler())
        self._conn=self._director.open(self._request)

if __name__=='__main__':
    url='http://www.winemag.com/buyingGuide/login.asp'
    req_data={'LoginID':'wine',
              'LoginPassword':'enthusiast',
              'Submit':'Login' }
    winemag=HTTPConnection(url, req_data, {})






More information about the Python-list mailing list