urllib2 rate limiting
Nick Craig-Wood
nick at craig-wood.com
Fri Jan 11 04:30:04 EST 2008
Dimitrios Apostolou <jimis at gmx.net> wrote:
> I want to limit the download speed when using urllib2. In particular,
> having several parallel downloads, I want to make sure that their total
> speed doesn't exceed a maximum value.
>
> I can't find a simple way to achieve this. After researching a can try
> some things but I'm stuck on the details:
>
> 1) Can I overload some method in _socket.py to achieve this, and perhaps
> make this generic enough to work even with other libraries than urllib2?
>
> 2) There is the urllib.urlretrieve() function which accepts a reporthook
> parameter.
Here is an implementation based on that idea. I've used urllib rather
than urllib2 as that is what I'm familiar with.
------------------------------------------------------------
#!/usr/bin/python
"""
Fetch a url rate limited
Syntax: rate URL local_file_name
"""
import os
import sys
import urllib
from time import time, sleep
class RateLimit(object):
"""Rate limit a url fetch"""
def __init__(self, rate_limit):
"""rate limit in kBytes / second"""
self.rate_limit = rate_limit
self.start = time()
def __call__(self, block_count, block_size, total_size):
total_kb = total_size / 1024
downloaded_kb = (block_count * block_size) / 1024
elapsed_time = time() - self.start
if elapsed_time != 0:
rate = downloaded_kb / elapsed_time
print "%d kb of %d kb downloaded %f.1 kBytes/s\n" % (downloaded_kb ,total_kb, rate),
expected_time = downloaded_kb / self.rate_limit
sleep_time = expected_time - elapsed_time
print "Sleep for", sleep_time
if sleep_time > 0:
sleep(sleep_time)
def main():
"""Fetch the contents of urls"""
if len(sys.argv) != 4:
print 'Syntax: %s "rate in kBytes/s" URL "local output path"' % sys.argv[0]
raise SystemExit(1)
rate_limit, url, out_path = sys.argv[1:]
rate_limit = float(rate_limit)
print "Fetching %r to %r with rate limit %.1f" % (url, out_path, rate_limit)
urllib.urlretrieve(url, out_path, reporthook=RateLimit(rate_limit))
if __name__ == "__main__": main()
------------------------------------------------------------
Use it like this
$ ./rate-limited-fetch.py 16 http://some/url/or/other z
Fetching 'http://some/url/or/other' to 'z' with rate limit 16.0
0 kb of 10118 kb downloaded 0.000000.1 kBytes/s
Sleep for -0.0477550029755
8 kb of 10118 kb downloaded 142.073242.1 kBytes/s
Sleep for 0.443691015244
16 kb of 10118 kb downloaded 32.130966.1 kBytes/s
Sleep for 0.502038002014
24 kb of 10118 kb downloaded 23.952789.1 kBytes/s
Sleep for 0.498028993607
32 kb of 10118 kb downloaded 21.304672.1 kBytes/s
Sleep for 0.497982025146
40 kb of 10118 kb downloaded 19.979510.1 kBytes/s
Sleep for 0.497948884964
48 kb of 10118 kb downloaded 19.184721.1 kBytes/s
Sleep for 0.498008966446
...
1416 kb of 10118 kb downloaded 16.090774.1 kBytes/s
Sleep for 0.499262094498
1424 kb of 10118 kb downloaded 16.090267.1 kBytes/s
Sleep for 0.499293088913
1432 kb of 10118 kb downloaded 16.089760.1 kBytes/s
Sleep for 0.499292135239
1440 kb of 10118 kb downloaded 16.089254.1 kBytes/s
Sleep for 0.499267101288
...
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list