threads and urllib

Wed Feb 20 15:28:56 EST 2002

This simple little program retrieves web pages using threads and
urllib. Now it seems to work fine with many websites, however, more
often than not it just hangs there and the thread (or is it the
socket?) does not time out and die.

Is there anyway to specify a time limit in which the operation can
happen and enforce it in the code?

Any help will be much appreciated.

cheers,

Sandy

--------snip-----------------------------------------------------------

import urllib
import threading
import time

t1 = time.time()

class SerialAgent:
    def run(self):
        self.webpage = urllib.urlopen(self.url).read()
        #print self.webpage

class Agent(SerialAgent, threading.Thread):
    def __init__(self, url):
        self.url = url
        threading.Thread.__init__(self, name=url)
        self.webpage = ''

alist  = []

for url in ['http://www.python.org', 'http://www.zope.org']:
    a = Agent(url)
    print a
    a.start()
    print a
    alist.append(a)

for a in alist:
    a.join(20.0)
    print a, len(a.webpage)