Threading hang problems with requests module

John Levine johnl at taugh.com
Mon Jan 13 20:06:25 EST 2020


I have a small script that goes down a list of domain names, does some
DNS lookups for santity checks, then if the checks look OK fetches
http://{domain}/ with requests.get() and looks at the text, if any,
returned.

When I run the checks in parallel with concurrent.futures, the script
inevitably hangs after a while, and when I kill it it's in thread
locks.  A similar script just doing DNS lookups works fine so I don't
think the concurrent.futures part is wrong.  Is this a known problem?

Here's what I'm doing in parallel, leaving out stuff unrelated to the
web fetches.  Is there anything wrong here?  Is this a known problem
with requests or httplib3?  I'm running it on MacOS under 3.8.1 but
had the same problem under 3.7.4.

def lookup1(d):
    """ lookup one domain
    """
    ans = dict( ... stuff ...)
    ... various DNS tests ...

    # try a web site
    try:
        r = requests.get(f"http://{d}/", timeout=webtimeout) # webtimeout is 10 seconds
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout,
        requests.exceptions.TooManyRedirects )as e:
        print("no web",e)
        ans['noweb'] = 1
        return ans
    except:
        print("no web, no reason")
        ans['noweb'] = 1
        return ans

    ... various text comparisons  on r.text ...

    return ans


Here's the traceback when I kill the hung program with a couple of ^C

no web HTTPConnectionPool(host='apo-taxi.info', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x1052df430>, 'Connection to apo-taxi.info timed out. (connect timeout=10)'))

[ long wait here, obviously hung ]

load: 1.58  cmd: Python 16548 waiting 7.97u 1.36s

^C^CTraceback (most recent call last):
  File "tldtaste.py", line 195, in lkup
    for future in concurrent.futures.as_completed(fl):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 244, in as_completed
    waiter.event.wait(wait_timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tldtaste.py", line 311, in <module>
    n = lkup(dl)
  File "tldtaste.py", line 200, in lkup
    print("thread barf", exc, file=sys.stderr)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 636, in __exit__
    self.shutdown(wait=True)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 236, in shutdown
    t.join()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit
    t.join()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
-- 
Regards,
John Levine, johnl at taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly


More information about the Python-list mailing list