Newbie question: Why does read() method of urllib hang?
Alan Runyan
runyaga at noeggsorspam.runyaga.com
Mon Feb 11 17:59:12 EST 2002
> mystring=u.read()
>
> works fine, but sometimes it just hangs. Is this simply because the remote
> server is not responding? I would have thought that would cause
> urllib.urlopen() to hang, not u.read().
Andrew, what version of python are you running? A friend of mine who I am
trying to convert to Python ran into this exact problem.
He was trying to do a HTTP POST to a web page, which was assigning him
cookies and redirecting him (the Real World ;). urllib
doesnt handle this very well at all ;'(. he reported to me urlopen() was
hanging so I gave it a go. I'm using Python 2.1.2 and I could not
reproduce this.
So.. what I attempted was to re-write what he assumed urlopen() would do for
him. and now I am stuck. I'm not quite sure how
cookies and redirect work together. I know urllib2 kinda gives you some
more options, but this is *very* unintuitive I believe. we really need
examples. here is my code if someone could take a look at it and see what I
am trying to do I would greatly appreciate it.
-- snip! --
import urllib2, urllib, urlparse
from urllib2 import Request
import httplib
DEBUG = 1
class CookieHTTPRedirectHandler(urllib2.HTTPRedirectHandler,
urllib2.HTTPHandler):
def http_error_302(self, req, fp, code, msg, headers):
if DEBUG:
print 'was going to ' + req._Request__original +
str(req.headers)
import pdb; pdb.set_trace()
if headers.has_key('location'):
newurl = headers['location']
elif headers.has_key('uri'):
newurl = headers['uri']
else:
print 'returning'
return
newurl = urlparse.urljoin(req.get_full_url(), newurl)
# XXX Probably want to forget about the state of the current
# request, although that might interact poorly with other
# handlers that also use handler-specific request attributes
response_headers={}
for head in headers.headers:
cookie='Set-Cookie:'
if head[:len(cookie)]==cookie:
response_headers[cookie]=head[len(cookie)+1:]
print 'redirect headers ' + str(response_headers)
new = Request(newurl, req.get_data(), response_headers)
if DEBUG:
print 'redirected to ' + new._Request__original
new.error_302_dict = {}
if hasattr(req, 'error_302_dict'):
if len(req.error_302_dict)>10 or \
req.error_302_dict.has_key(newurl):
raise HTTPError(req.get_full_url(), code,
self.inf_msg + msg, headers, fp)
new.error_302_dict.update(req.error_302_dict)
new.error_302_dict[newurl] = newurl
# Don't close the fp until we are sure that we won't use it
# with HTTPError.
fp.read()
fp.close()
print 'returning : ' + str(new.headers)
return self.parent.open(new)
def http_open(self, req):
return self.do_open(httplib.HTTP, req)
class HTTPConnection:
def __init__(self, url, request_data, headers):
self._request=urllib2.Request(url, urllib.urlencode(request_data),
{})
self._director=urllib2.OpenerDirector()
self._director.add_handler(CookieHTTPRedirectHandler())
self._conn=self._director.open(self._request)
if __name__=='__main__':
url='http://www.winemag.com/buyingGuide/login.asp'
req_data={'LoginID':'wine',
'LoginPassword':'enthusiast',
'Submit':'Login' }
winemag=HTTPConnection(url, req_data, {})
More information about the Python-list
mailing list