Downloading binary files - Python3

Peter Otten __peter__ at web.de
Sat Mar 21 10:45:48 EDT 2009


Anders Eriksson wrote:

> Hello,
> 
> I have made a short program that given an url will download all referenced
> files on that url.
> 
> It works, but I'm thinking it could use some optimization since it's very
> slow.
> 
> I create a list of tuples where each tuple consist of the url to the file
> and the path to where I want to save it. E.g
> (http://somewhere.com/foo.mp3, c:\Music\foo.mp3)
> 
> The downloading part (which is the part I need help with) looks like this:
> def GetFiles():

Consider passing 'hreflist' explicitly. Global variables make your script
harder to manage in the long run.

>     """do the actual copying of files"""
>     for url,path in hreflist:
>         print(url,end=" ")

You can force python to write out its internal buffer by calling

          sys.stdout.flush()

You may also take a look at the logging package.

>         srcdata = urlopen(url).read()

For large files you would read the source in chunks:

          src = urlopen(url)
          with open(path, mode="wb") as dstfile: 
              while True:
                   chunk =  src.read(2**20)
                   if not chunk:
                       break 
                   dstfile.write(chunk)

Instead of writing this loop yourself you can use

          shutil.copyfileobj(src, dstfile)

or even

          urllib.request.urlretrieve(url, path)

which also takes care of opening the file.

>         dstfile = open(path,mode='wb')
>         dstfile.write(srcdata)
>         dstfile.close()
>         print("Done!")
> 
> hreflist if the list of tuples.
> 
> at the moment the print(url,end=" ") will not be printed before the actual
> download, instead it will be printed at the same time as print("Done!").
> This I would like to have the way I intended.
> 
> Is downloading a binary file using: srcdata = urlopen(url).read()
> the best way? Is there some other way that would speed up the downloading?


The above method may not faster (the operation is "io-bound") but it is able
to handle large files gracefully.

Peter




More information about the Python-list mailing list