Fastest way to retrieve and write html contents to file

DFS nospam at dfs.com
Mon May 2 00:06:54 EDT 2016


I posted a little while ago about how short the python code was:

-------------------------------------
1. import urllib
2. urllib.urlretrieve(webpage, filename)
-------------------------------------

Which is very sweet compared to the VBScript version:

------------------------------------------------------
1. Option Explicit
2. Dim xmlHTTP, fso, fOut
3. Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
4. xmlHTTP.Open "GET", webpage
5. xmlHTTP.Send
6. Set fso = CreateObject("Scripting.FileSystemObject")
7. Set fOut = fso.CreateTextFile(filename, True)
8.  fOut.WriteLine xmlHTTP.ResponseText
9. fOut.Close
10. Set fOut = Nothing
11. Set fso  = Nothing
12. Set xmlHTTP = Nothing
------------------------------------------------------

Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10 
iterations, vs 0.88 for python.

webpage = 'http://econpy.pythonanywhere.com/ex/001.html'


So I tried:
---------------------------
import urllib2
r = urllib2.urlopen(webpage)
f = open(filename,"w")
f.write(r.read())
f.close
---------------------------
and
---------------------------
import requests
r = requests.get(webpage)
f = open(filename,"w")
f.write(r.text)
f.close
---------------------------
and
---------------------------------
import pycurl
with open(filename, 'wb') as f:
c = pycurl.Curl()
c.setopt(c.URL, webpage)
c.setopt(c.WRITEDATA, f)
c.perform()
c.close()
---------------------------------

urllib2 and requests were about the same speed as urllib.urlretrieve, 
while pycurl was significantly slower (1.2 seconds).

I'm running Win 8.1.  python 2.7.11 32-bit.

I know it's asking a lot, but is there a really fast AND really short 
python solution for this simple thing?


Thanks!





More information about the Python-list mailing list