Fastest way to retrieve and write html contents to file

DFS nospam at dfs.com
Mon May 2 02:47:01 EDT 2016


On 5/2/2016 2:05 AM, Steven D'Aprano wrote:
> On Monday 02 May 2016 15:00, DFS wrote:
>
>> I tried the 10-loop test several times with all versions.
>>
>> The results were 100% consistent: VBSCript xmlHTTP was always 2x faster
>> than any python method.
>
>
> Are you absolutely sure you're comparing the same job in two languages?

As near as I can tell.  In VBScript I'm actually dereferencing various 
objects (that adds to the time), but I don't do that in python.  I don't 
know enough to even know if it's necessary, or good practice, or what.




> Is VB using a local web cache, and Python not?

I'm not specifying a local web cache with either (wouldn't know how or 
where to look).  If you have Windows, you can try it.
-------------------------------------------------------------------
Option Explicit
Dim xmlHTTP, fso, fOut, startTime, endTime, webpage, webfile,i
webpage = "http://econpy.pythonanywhere.com/ex/001.html"
webfile  = "D:\econpy001.html"
startTime = Timer
For i = 1 to 10
  Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
  xmlHTTP.Open "GET", webpage
  xmlHTTP.Send
  Set fso = CreateObject("Scripting.FileSystemObject")
  Set fOut = fso.CreateTextFile(webfile, True)
   fOut.WriteLine xmlHTTP.ResponseText
  fOut.Close
  Set fOut    = Nothing
  Set fso     = Nothing
  Set xmlHTTP = Nothing
Next
endTime = Timer
wscript.echo "Finished VBScript in " & FormatNumber(endTime - 
startTime,3) & " seconds"
-------------------------------------------------------------------
save it to a .vbs file and run it like this:
$cscript /nologo filename.vbs


> Are you saving files with both
> tests? To the same local drive? (To ensure you aren't measuring the
> difference between "write this file to a slow IDE hard disk, write that file
> to a fast SSD".)

Identical functionality (retrieve webpage, write html to file).  Same 
webpage, written to the same folder on the same hard drive (not SSD).

The 10 file writes (open/write/close) don't make a meaningful difference 
at all:
VBScript 0.0156 seconds
urllib2  0.0034 seconds

This file is 3.55K.


> Once you are sure that you are comparing the same task in two languages,
> then make sure the measurement is meaningful. If you change from a (let's
> say) 1 KB file to a 100 KB file, do you see the same 2 x difference? What if
> you increase it to a 10000 KB file?

Do you know a webpage I can hit 10x repeatedly to download a good size 
file?  I'm always paranoid they'll block me thinking I'm a 
"professional" web scraper or something.

Thanks





More information about the Python-list mailing list