urllib.urlopen + https = all threads locked?

Ng Pheng Siong ngps at madcap.dyndns.org
Sat Oct 27 11:13:51 EDT 2001


According to Jeff Johnson <usenet at jeffjohnson.net>:
> I have a multi-threaded application that makes calls
> urllib.urlopen("https://yadayada.com").  Until the call completes (or
> maybe the following read) all other threads just freeze.  They
> continue working after the call completes so it took me a while to
> realize there was a problem.
> 
> Someone suggested:
> 
> "Sounds like the C code that does the SSL is not _releasing_ 
> the global thread lock."

Yup, sounds like it.

<plug>
Try M2Crypto, it does threading. (Just don't do the info_callback thingy
for now. ;-)

M2Crypto also does SSL session re-use.

Here's a demo that I posted a little while ago. For serious usage you
should replace the single 'GET ??? HTTP/1.0' line with calls to either
http[s]lib or urllib.


  """M2Crypto.SSL.Session client demo: This program requests a URL from 
  a HTTPS server, saves the negotiated SSL session id, parses the HTML 
  returned by the server, then requests each HREF in a separate thread 
  using the saved SSL session id.
  
  Copyright (c) 1999-2000 Ng Pheng Siong. All rights reserved."""
  
  RCS_id='$Id: sess.py,v 1.2 2000/09/11 14:52:29 ngps Exp ngps $'
  
  from M2Crypto import Err, Rand, SSL, X509, threading
  m2_threading = threading; del threading
  
  import formatter, getopt, htmllib, sys
  from threading import Thread
  from socket import gethostname
  
  
  def handler(sslctx, host, port, href, recurs=0, sslsess=None):
  
      s = SSL.Connection(sslctx)
      if sslsess:
          s.set_session(sslsess)
          s.connect((host, port))
      else:
          s.connect((host, port))
          sslsess = s.get_session()
      #print sslsess.as_text()
  
      if recurs:
          p = htmllib.HTMLParser(formatter.NullFormatter())
  
      f = s.makefile("rw")
      f.write(href)
      f.flush()
  
      while 1:
          data = f.read()
          if not data:
              break
          if recurs:
              p.feed(data)
  
      if recurs:
          p.close()
  
      f.close()
  
      if recurs:
          for a in p.anchorlist:
              req = 'GET %s HTTP/1.0\r\n\r\n' % a
              thr = Thread(target=handler, 
                          args=(sslctx, host, port, req, recurs-1, sslsess))
              print "Thread =", thr.getName()
              thr.start()
      
  
  if __name__ == '__main__':
  
      m2_threading.init()
      Rand.load_file('../randpool.dat', -1) 
  
      host = '127.0.0.1'
      port = 443
      req = '/'
  
      optlist, optarg = getopt.getopt(sys.argv[1:], 'h:p:r:')
      for opt in optlist:
          if '-h' in opt:
              host = opt[1]
          elif '-p' in opt:
              port = int(opt[1])
          elif '-r' in opt:
              req = opt[1]
      
      ctx = SSL.Context('sslv3')
      ctx.load_cert('client.pem')
      ctx.load_verify_info('ca.pem')
      ctx.load_client_ca('ca.pem')
      ctx.set_verify(SSL.verify_none, 10)
      
      req = 'GET %s HTTP/1.0\r\n\r\n' % req
  
      start = Thread(target=handler, args=(ctx, host, port, req, 1))
      print "Thread =", start.getName()
      start.start()
      start.join()
      
      m2_threading.cleanup()
      Rand.save_file('../randpool.dat')

</plug>


-- 
Ng Pheng Siong <ngps at post1.com> * http://www.post1.com/home/ngps




More information about the Python-list mailing list