download web page with python

Skip Montanaro skip at mojam.com
Thu Dec 16 09:48:35 EST 1999


Here's a simple recipe for downloading a URL that's the result of a form
fill-in (examples from the original request):

 1. Download the page the form resides on.

    base = http://www.symantec.com/avcenter/download.html

 2. Locate the ACTION attribute of the form of interest.

    action = /avcenter/cgi-bin/navsarc.cgi

 3. Make note of the METHOD attribute of the form.

 4. Identify all the form's variables and any possible values they might
    assume.  Don't forget hidden <input>s and the name and value of submit
    <inputs>s if any were given.

    PROD      NDW, GW, NMC, GW, ...
    LANG      DA, DE, UK, US, ...

 5. Build a base URL from the result of 1 & 2 using urlparse.urljoin.

    baseurl = urlparse.urljoin(base, action)

 6. If you have a GET method URL, build a full URL from the combination of
    the base url and the paramenters.

    url = "%s?PROD=NDW&LANG=DA" % baseurl
    valuedict = None

    If you have a POST method URL, you can try the above (many CGI scripts
    are method-agnostic), but the server may actually require the URL be
    called using a POST method, so be prepared to build a value dict.

    valuedict = {'PROD': 'NDW', 'LANG': 'DA'}

 7. Grab the fully parameterized URL using urllib.urlopen and read it.

    if valuedict: params = urllib.urlencode(valuedict)
    else: params = None
    f = urllib.urlopen(url, params)
    bytes = f.read()

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/
847-971-7098   | Python: Programming the way Guido indented...




More information about the Python-list mailing list