Using Python 2.1 to download asp www pages that require cookies : My solution

Zugz zugz.public at DEL-ete-MEbtinternet.com
Thu Jan 17 16:58:22 EST 2002


Earlier I posted "Hi,

I've recently written some Python code to extract some details about posting
frequency etc from a board I use regularly.

I used IE5.5's Save As to give me some pages to work on offline.

I would now like to automate the whole process by downloading all the
relevant pages or maybe even just accessing them direct.

If I use urlopen on a regular .htm page, in this case from the collection of
links I call my www site, then things work as you would expect. You get the
html source:

>>>a=urllib.urlopen("http://www.zugz.btinternet.co.uk/NonSFBooksBookshops.ht
m")
>>> print a.read()

as you would hope.

However if I access one of the pages of interest, which all have the same
form as below but with the a varying last page number:

>>>a=urllib.urlopen("http://boards.gamers.com/messages/overview.asp?name=pan
ther_xl&page=2")
>>> print a.read()

Then you do not get the page source but some HTML about the page being
moved.

So is this a function of it being an asp page and my luck is out or is there
a simple way to achieve what I wish anyway.

Thanks in advance for any help you may be able to give."

-------------------

Well "I" solved it and documented the solution here:
See the solution here:
http://www.zugz.btinternet.co.uk/python.htm
Wish I could claim credit though :(
Zugz.







More information about the Python-list mailing list