urllib2 timeout not working - stalls for an hour or so

Sumeet Sandhu sumeet.k.sandhu at gmail.com
Fri Sep 2 12:38:38 EDT 2016


On Friday, September 2, 2016 at 6:05:05 AM UTC-7, Peter Otten wrote:
> Sumeet Sandhu wrote:
> 
> > Hi,
> > 
> > I use urllib2 to grab google.com webpages on my Mac over my Comcast home
> > network.
> > 
> > I see about 1 error for every 50 pages grabbed. Most exceptions are
> > ssl.SSLError, very few are socket.error and urllib2.URLError.
> > 
> > The problem is - after a first exception, urllib2 occasionally stalls for
> > upto an hour (!), at either the urllib2.urlopen or response.read stages.
> > 
> > Apparently the urllib2 and socket timeouts are not effective here - how do
> > I fix this?
> > 
> > ----------------
> > import urllib2
> > import socket
> > from sys import exc_info as sysExc_info
> > timeout = 2
> > socket.setdefaulttimeout(timeout)
> > 
> >     try :
> >         req = urllib2.Request(url,None,headers)
> >         response = urllib2.urlopen(req,timeout=timeout)
> >         html = response.read()
> >     except :
> >         e = sysExc_info()[0]
> >         open(logfile,'a').write('Exception: %s \n' % e)
> > < code that follows this : after the first exception, I try again for a
> > few tries >
> 
> I'd use separate try...except-s for response = urlopen() and 
> response.read(). If the problem originates with read() you could try to 
> replace it with select.select([response.fileno()], [], [], timeout) calls in 
> a loop.

thanks Peter. I will try this.
However, I suspect Comcast is rate limiting my home use. Is there a workaround for that? 
What I really need is to somehow monitor the full url-read loop and break it if Comcast is stalling me too long...



More information about the Python-list mailing list