How to batch download files from web page?

Michael Geary Mike at DeleteThis.Geary.com
Wed May 12 12:00:34 EDT 2004


> I wish to download hundreds of files from the University of Iowa
> sound archive.  Doing it manually would be a daunting task
> especially since the files are each a few mega bytes long.  Is there
> a standard way of using Python for such a task?  I have a fair
> amount of programming experiance but very little of it relates
> to networks.
>
> For those who are intrested the University of Iowa's sound archive
> may be found at http://theremin.music.uiowa.edu/MIS.html

The easiest way to download the individual files is with
urllib.urlretrieve().

You can parse the HTML files using either htmllib.HTMLParser or the
HTMLParser module, combined with urllib.urlopen().

If you do this Google search:

urlopen htmlparser

The first several matches have some nice code samples showing how to find
the links in a web page. Add urlretrieve() to download the files and you'll
be off and running.

-Mike





More information about the Python-list mailing list