download web page with python
Skip Montanaro
skip at mojam.com
Thu Dec 16 09:48:35 EST 1999
Here's a simple recipe for downloading a URL that's the result of a form
fill-in (examples from the original request):
1. Download the page the form resides on.
base = http://www.symantec.com/avcenter/download.html
2. Locate the ACTION attribute of the form of interest.
action = /avcenter/cgi-bin/navsarc.cgi
3. Make note of the METHOD attribute of the form.
4. Identify all the form's variables and any possible values they might
assume. Don't forget hidden <input>s and the name and value of submit
<inputs>s if any were given.
PROD NDW, GW, NMC, GW, ...
LANG DA, DE, UK, US, ...
5. Build a base URL from the result of 1 & 2 using urlparse.urljoin.
baseurl = urlparse.urljoin(base, action)
6. If you have a GET method URL, build a full URL from the combination of
the base url and the paramenters.
url = "%s?PROD=NDW&LANG=DA" % baseurl
valuedict = None
If you have a POST method URL, you can try the above (many CGI scripts
are method-agnostic), but the server may actually require the URL be
called using a POST method, so be prepared to build a value dict.
valuedict = {'PROD': 'NDW', 'LANG': 'DA'}
7. Grab the fully parameterized URL using urllib.urlopen and read it.
if valuedict: params = urllib.urlencode(valuedict)
else: params = None
f = urllib.urlopen(url, params)
bytes = f.read()
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/
847-971-7098 | Python: Programming the way Guido indented...
More information about the Python-list
mailing list