Fastest way to retrieve and write html contents to file

DFS nospam at dfs.com
Mon May 2 01:59:51 EDT 2016


On 5/2/2016 1:15 AM, Stephen Hansen wrote:
> On Sun, May 1, 2016, at 10:00 PM, DFS wrote:
>> I tried the 10-loop test several times with all versions.
>
> Also how, _exactly_, are you testing this?
>
> C:\Python27>python -m timeit "filename='C:\\test.txt';
> webpage='http://econpy.pythonanywhere.com/ex/001.html'; import urllib2;
> r = urllib2.urlopen(webpage); f = open(filename, 'w');
> f.write(r.read()); f.close();"
> 10 loops, best of 3: 175 msec per loop
>
> That's a whole lot less the 0.88secs.

Indeed.


---------------------------------------------------------------------
import requests, urllib, urllib2, pycurl
import time

webpage = "http://econpy.pythonanywhere.com/ex/001.html"
webfile = "D:\\econpy001.html"
loops   = 10

startTime = time.clock()	
for i in range(loops):
	urllib.urlretrieve(webpage,webfile)
endTime = time.clock()		
print "Finished urllib in %.2g seconds" %(endTime-startTime)

startTime = time.clock()	
for i in range(loops):
	r = urllib2.urlopen(webpage)
	f = open(webfile,"w")
	f.write(r.read())
	f.close
endTime = time.clock()		
print "Finished urllib2 in %.2g seconds" %(endTime-startTime)

startTime = time.clock()	
for i in range(loops):
	r = requests.get(webpage)
	f = open(webfile,"w")
	f.write(r.text)
	f.close
endTime = time.clock()		
print "Finished requests in %.2g seconds" %(endTime-startTime)

startTime = time.clock()	
for i in range(loops):
	with open(webfile + str(i) + ".txt", 'wb') as f:
		c = pycurl.Curl()
		c.setopt(c.URL, webpage)
		c.setopt(c.WRITEDATA, f)
		c.perform()
		c.close()
endTime = time.clock()		
print "Finished pycurl in %.2g seconds" %(endTime-startTime)
---------------------------------------------------------------------

$ python getHTML.py
Finished urllib in 0.88 seconds
Finished urllib2 in 0.83 seconds
Finished requests in 0.89 seconds
Finished pycurl in 1.1 seconds

Those results are consistent.  They go up or down a little, but never 
below 0.82 seconds (for urllib2), or above 1.2 seconds (for pycurl)

VBScript is consistently 0.44 to 0.48




More information about the Python-list mailing list