socket timeout / m2crypto.urllib problems

John Hunter jdhunter at ace.bsd.uchicago.edu
Wed Sep 22 16:07:17 EDT 2004


I have a test script below which I use to fetch urls into strings,
either over https or http.  When over https, I use m2crypto.urllib and
when over http I use the standard urllib.  Whenever, I import sockets
and setdefaulttimeout, however, using m2crypto.urllib tends to cause a
http.BadStatusLine to be raised, even if the timeout is set to be very
large.  All of the documents in the test script can be accessed
publicly.

Any ideas?  Is there a better/easier way to get https docs in python?

Thanks,
JDH

import urllib, socket
from cStringIO import StringIO
from M2Crypto import Rand, SSL, m2urllib

#comment out this line and the script generally works, but without it
#my zope process, which is using this code, hangs.
socket.setdefaulttimeout(200)


def url_to_string(source):
    """
    get url as string, for https and http
    """
    if source.startswith('https:'):
        sh = StringIO()
        url = m2urllib.FancyURLopener()
        url.addheader('Connection', 'close')
        u = url.open(source)
 
        while 1:
            data = u.read()
            if not data: break
            sh.write(data)
        return sh.getvalue()
    else:
        return urllib.urlopen(source).read()
 
if __name__=='__main__':
 
 
   s1 = url_to_string('https://crcdocs.bsd.uchicago.edu/crcdocs/Files/informatics.doc')
 
   s2 = url_to_string('http://yahoo.com')
 
   s3 = url_to_string('https://crcdocs.bsd.uchicago.edu/crcdocs/Files/facepage.doc')
   print len(s1), len(s2), len(s3)




More information about the Python-list mailing list