urllib2 rate limiting

Fri Jan 11 04:30:04 EST 2008

Dimitrios Apostolou <jimis at gmx.net> wrote:
>  I want to limit the download speed when using urllib2. In particular, 
>  having several parallel downloads, I want to make sure that their total 
>  speed doesn't exceed a maximum value.
> 
>  I can't find a simple way to achieve this. After researching a can try 
>  some things but I'm stuck on the details:
> 
>  1) Can I overload some method in _socket.py to achieve this, and perhaps 
>  make this generic enough to work even with other libraries than urllib2?
> 
>  2) There is the urllib.urlretrieve() function which accepts a reporthook 
>  parameter.

Here is an implementation based on that idea.  I've used urllib rather
than urllib2 as that is what I'm familiar with.

------------------------------------------------------------
#!/usr/bin/python

"""
Fetch a url rate limited

Syntax: rate URL local_file_name
"""

import os
import sys
import urllib
from time import time, sleep

class RateLimit(object):
    """Rate limit a url fetch"""
    def __init__(self, rate_limit):
        """rate limit in kBytes / second"""
        self.rate_limit = rate_limit
        self.start = time()
    def __call__(self, block_count, block_size, total_size):
        total_kb = total_size / 1024
        downloaded_kb = (block_count * block_size) / 1024
        elapsed_time = time() - self.start
        if elapsed_time != 0:
            rate = downloaded_kb / elapsed_time
            print "%d kb of %d kb downloaded %f.1 kBytes/s\n" % (downloaded_kb ,total_kb, rate),
            expected_time = downloaded_kb / self.rate_limit
            sleep_time = expected_time - elapsed_time
            print "Sleep for", sleep_time
            if sleep_time > 0:
                sleep(sleep_time)

def main():
    """Fetch the contents of urls"""
    if len(sys.argv) != 4:
        print 'Syntax: %s "rate in kBytes/s" URL "local output path"' % sys.argv[0]
        raise SystemExit(1)
    rate_limit, url, out_path = sys.argv[1:]
    rate_limit = float(rate_limit)
    print "Fetching %r to %r with rate limit %.1f" % (url, out_path, rate_limit)
    urllib.urlretrieve(url, out_path, reporthook=RateLimit(rate_limit))

if __name__ == "__main__": main()
------------------------------------------------------------

Use it like this

$ ./rate-limited-fetch.py 16 http://some/url/or/other z
Fetching 'http://some/url/or/other' to 'z' with rate limit 16.0
0 kb of 10118 kb downloaded 0.000000.1 kBytes/s
Sleep for -0.0477550029755
8 kb of 10118 kb downloaded 142.073242.1 kBytes/s
Sleep for 0.443691015244
16 kb of 10118 kb downloaded 32.130966.1 kBytes/s
Sleep for 0.502038002014
24 kb of 10118 kb downloaded 23.952789.1 kBytes/s
Sleep for 0.498028993607
32 kb of 10118 kb downloaded 21.304672.1 kBytes/s
Sleep for 0.497982025146
40 kb of 10118 kb downloaded 19.979510.1 kBytes/s
Sleep for 0.497948884964
48 kb of 10118 kb downloaded 19.184721.1 kBytes/s
Sleep for 0.498008966446
...
1416 kb of 10118 kb downloaded 16.090774.1 kBytes/s
Sleep for 0.499262094498
1424 kb of 10118 kb downloaded 16.090267.1 kBytes/s
Sleep for 0.499293088913
1432 kb of 10118 kb downloaded 16.089760.1 kBytes/s
Sleep for 0.499292135239
1440 kb of 10118 kb downloaded 16.089254.1 kBytes/s
Sleep for 0.499267101288
...

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick