Using Python 2.1 to download asp www pages

Hamish Lawson hamish_lawson at yahoo.co.uk
Mon Jan 7 06:20:30 EST 2002


Zugz wrote:

> I've recently written some Python code to extract some details about posting
> frequency etc from a board I use regularly.
> 
> I used IE5.5's Save As to give me some pages to work on offline.
> 
> I would now like to automate the whole process by downloading all the
> relevant pages or maybe even just accessing them direct.

As others have mentioned, it could be tricky handling redirects in the
various guises in which they can come, given that urllib.urlopen is
not a mini-browser capable of interpreting the retrieved web page.
Therefore an alternative approach may be to use an actual browser, say
IE, and drive it via COM from Python. For this you will need the
win32all library. If you are using ActivePython, you should have it
already. Otherwise you can get it from:

  Python 1.5.2 through 2.1:

    http://aspn.activestate.com/ASPN/Downloads/ActivePython/Extensions/Win32all

  Python 2.2:

    http://users.bigpond.net.au/mhammond/win32all-142.exe


Hamish Lawson



More information about the Python-list mailing list