Fastest way to retrieve and write html contents to file

DFS nospam at dfs.com
Mon May 2 21:51:31 EDT 2016


On 5/2/2016 3:19 AM, Chris Angelico wrote:

> There's an easier way to test if there's caching happening. Just crank
> the iterations up from 10 to 100 and see what happens to the times. If
> your numbers are perfectly fair, they should be perfectly linear in
> the iteration count; eg a 1.8 second ten-iteration loop should become
> an 18 second hundred-iteration loop. Obviously they won't be exactly
> that, but I would expect them to be reasonably close (eg 17-19
> seconds, but not 2 seconds).

100 loops
Finished VBScript in 3.953 seconds
Finished VBScript in 3.608 seconds
Finished VBScript in 3.610 seconds

Bit of a per-loop speedup going from 10 to 100.


> Then the next thing to test would be to create a deliberately-slow web
> server, and connect to that. Put a two-second delay into it, to
> simulate a distant or overloaded server, and see if your logs show the
> correct result. Something like this:
>
> --------
>
> import time
> try:
>     import http.server as BaseHTTPServer # Python 3
> except ImportError:
>     import BaseHTTPServer # Python 2
>
> class SlowHTTP(BaseHTTPServer.BaseHTTPRequestHandler):
>     def do_GET(self):
>         self.send_response(200)
>         self.send_header("Content-type","text/html")
>         self.end_headers()
>         self.wfile.write(b"Hello, ")
>         time.sleep(2)
>         self.wfile.write(b"world!")
>
> server = BaseHTTPServer.HTTPServer(("", 1234), SlowHTTP)
> server.serve_forever()
>
> -------
>
> Test that with a web browser or command-line downloader (go to
> http://127.0.0.1:1234/), and make sure that (a) it produces "Hello,
> world!", and (b) it takes two seconds. Then set your test scripts to
> downloading that URL. (Be sure to set them back to low iteration
> counts first!) If the times are true and fair, they should all come
> out pretty much the same - ten iterations, twenty seconds. And since
> all that's changed is the server, this will be an accurate
> demonstration of what happens in the real world: network requests
> aren't always fast. Incidentally, you can also watch the server's log
> to see if it's getting the appropriate number of requests.
>
> It may turn out that changing the web server actually materially
> changes your numbers. Comment out the sleep call and try it again -
> you might find that your numbers come closer together, because this
> naive server doesn't send back 204 NOT MODIFIED responses or anything.
> Again, though, this would prove that you're not actually measuring
> language performance, because the tests are more dependent on the
> server than the client.
>
> Even if the files themselves aren't being cached, you might find that
> DNS is. So if you truly want to eliminate variables, replace the name
> in your URL with an IP address. It's another thing that might mess
> with your timings, without actually being a language feature.
>
> Networking has about four billion variables in it. You're messing with
> one of the least significant: the programming language :)
>
> ChrisA


Thanks for the good feedback.





More information about the Python-list mailing list