[Python-Dev] mysterious hangs in socket code

Jeremy Hylton jeremy@alum.mit.edu
Tue, 3 Sep 2002 17:53:46 -0400


I've been running a small, multi-threaded program to retrieve web
pages today.  The entire program appears to hang when I perform a slow
DNS operation, even there is no application-level coordinate between
the threads.

The motivation comes from http://www.python.org/sf/591349, but I ended
up writing a similar small test script, which I've attached.

When I run this program with Python 2.1, it produces a steady stream
of output -- urls and the time it took to load them.  Most of the
pages take less than a second, but some take a very long time.

If I run this program with Python 2.2 or 2.3, it produces little
bursts of output, then pauses for a long time, then repeats.

I believe that the problem relates to DNS lookups, but not in a way I
fully understand.  If I connect gdb to any of the threads while the
program is hung, it is always inside getaddrinfo().  My first
realization was that the socketmodule stopped wrapping DNS lookups in
By_BEGIN/END_ALLOW_THREADS calls when the IPv6 changes were
integrated.  But if I restore these calls --
    see http://www.python.org/sf/604210 --
I don't see any change in behavior.  The program still hangs
periodically.

One possibility is that the Linux getaddrinfo() is thread-safe, but
only by way of a lock that only allows one request to be outstanding
at a time.

Not sure what the other possibilities are, but the current behavior is
awful.

Jeremy

---------------------------------------------------------------------
import httplib
import Queue
import random
import sys
import threading
import time
import traceback
import urlparse

headers = {"Accept":
           "text/plain, text/html, image/jpeg, image/jpg, "
           "image/gif, image/png, */*"}

class URLThread(threading.Thread):

    def __init__(self, queue):
        threading.Thread.__init__(self)
        self._queue = queue
        self._stopevent = threading.Event()

    def stop(self):
        self._stopevent.set()
    
    def run(self):
        while not self._stopevent.isSet():
            self.fetch()

    def fetch(self):
        url = self._queue.get()
        t0 = time.time()
        try:
            self._fetch(url)
        except:
            etype, value, tb = sys.exc_info()
            L = ["Error occurred fetching %s\n" % url,
                 "%s: %s\n" % (etype, value),
                 ]
            L += traceback.format_tb(tb)
            sys.stderr.write("".join(L))
        t1 = time.time()
        print url, round(t1 - t0, 2)

    def _fetch(self, url):
        parts = urlparse.urlparse(url)
        host = parts[1]
        path = parts[2]
        h = httplib.HTTPConnection(host)
        h.connect()
        h.request("GET", path, headers=headers)
        r = h.getresponse()
        r.read()
        h.close()

urls = """\
http://www.andersen.com/
http://www.google.com/
http://www.google.com/images/logo.gif
http://www.microsoft.com/
http://www.microsoft.com/homepage/gif/bnr-microsoft.gif
http://www.microsoft.com/homepage/gif/1ptrans.gif
http://www.microsoft.com/library/toolbar/images/curve.gif
http://www.yahoo.com/
http://www.sourceforge.net/
http://www.slashdot.org/
http://www.kuro5hin.org/
http://www.intel.com/
http://www.aol.com/
http://www.amazon.com/
http://www.cnn.com/
http://money.cnn.com/
http://www.expedia.com/
http://www.tripod.com/
http://www.hotmail.com/
http://www.angelfire.com/
http://www.excite.com/
http://www.verisign.com/
http://www.riaa.com/
http://www.enron.com/
http://www.securityspace.com/
http://www.directv.com/
http://www.att.com/
http://www.qwest.com/
http://www.covad.com/
http://www.sprint.com/
http://www.mci.com/
http://www.worldcom.com/
"""
urls = [u for u in urls.split("\n") if u]

REPEAT = 10
THREADS = 8

class RandomQueue:

    def __init__(self, L):
        self.list = L

    def get(self):
        return random.choice(self.list)
        
if __name__ == "__main__":
    urlq = RandomQueue(urls)

    sys.setcheckinterval(10)

    threads = []
    for i in range(THREADS):
        t = URLThread(urlq)
        t.start()
        threads.append(t)

    while 1:
        try:
            time.sleep(30)
        except:
            break

    print "Shutting down threads..."
    for t in threads:
        t.stop()
    for t in threads:
        t.join()