httplib slow read

Kjetil Jacobsen setattr at yahoo.no
Thu Dec 6 05:26:14 EST 2001


John Hunter <jdhunter at nitace.bsd.uchicago.edu> wrote in message news:<m2itbliumd.fsf at mother.paradise.lost>...
> >>>>> "Toby" == Toby Dickenson <tdickenson at devmail.geminidataloggers.co.uk> writes:
> 
>     Toby> It could well be a problem with your hand-crafted http
>     Toby> request. I suggest you go with urllib.
> 
> I needed to set some headers, like Referer and Cookie, which is why I
> went with httplib.  I sniffed port 80 to find out how what was being
> sent by my browser, and then constructed the headers from that info,
> so I think my headers were ok.  Can't say for sure.
> 
> I suppose the headers can also be set with the urlencode format of
> urllib, so this is probably the way to go; thanks for the suggestion.
> Still curious why the read is so slow with httplib, though.

another option may be to use the pycurl module which wraps the
curl library:

>>> import pycurl
>>> f = open('output','w') # file to store document in
>>> c = pycurl.init()
>>> c.setopt(pycurl.URL, 'http://www.python.org')
>>> c.setopt(pycurl.FILE, f)
>>> c.perform()

pycurl is pretty efficient and in my experience performs faster
than httplib and urllib.  in particular when you have multiple
python-threads concurrently downloading documents.

curl:  http://curl.haxx.se
pycurl: http://pycurl.sf.net/

regards,
        - kjetil



More information about the Python-list mailing list