Fastest way to retrieve and write html contents to file

Stephen Hansen me+python at ixokai.io
Mon May 2 03:58:02 EDT 2016


On Mon, May 2, 2016, at 12:37 AM, DFS wrote:
> On 5/2/2016 2:27 AM, Stephen Hansen wrote:
> > I'm again going back to the point of: its fast enough. When comparing
> > two small numbers, "twice as slow" is meaningless.
> 
> Speed is always meaningful.
> 
> I know python is relatively slow, but it's a cool, concise, powerful 
> language.  I'm extremely impressed by how tight the code can get.

I'm sorry, but no. Speed is not always meaningful. 

It's not even usually meaningful, because you can't quantify what "speed
is". In context, you're claiming this is twice as slow (even though my
tests show dramatically better performance), but what details are
different?

You're ignoring the fact that Python might have a constant overhead --
meaning, for a 1k download, it might have X speed cost. For a 1meg
download, it might still have the exact same X cost.

Looking narrowly, that overhead looks like "twice as slow", but that's
not meaningful at all. Looking larger, that overhead is a pittance.

You aren't measuring that.

> > You have an assumption you haven't answered, that downloading a 10 meg
> > file will be twice as slow as downloading this tiny file. You haven't
> > proven that at all.
> 
> True.  And it has been my assumption - tho not with 10MB file.

And that assumption is completely invalid.

> I noticed urllib and curl returned the html as is, but urllib2 and 
> requests added enhancements that should make the data easier to parse. 
> Based on speed and functionality and documentation, I believe I'll be 
> using the requests HTTP library (I will actually be doing a small amount 
> of web scraping).

The requests library's added-value is ease-of-use, and its overhead is
likely tiny: so using it means you spend less effort making a thing
happen. I recommend you embrace this. 

> VBScript
> 1st run: 7.70 seconds
> 2nd run: 5.38
> 3rd run: 7.71
> 
> So python matches or beats VBScript at this much larger file.  Kewl.

This is what I'm talking about: Python might have a constant overhead,
but looking at larger operations, its probably comparable. Not fast,
mind you. Python isn't the fastest language out there. But in real world
work, its usually fast enough.

-- 
Stephen Hansen
  m e @ i x o k a i . i o



More information about the Python-list mailing list