Using Python 2.1 to download asp www pages that require cookies : My solution
bernie
bernie at 3captus.com
Fri Jan 18 06:25:11 EST 2002
Hi Zugu,
This is the case where you cannot just use urllib.urlopen. The page you are
trying to get is actually sending you a session cookie and excepted you to
send that cookie to it when you try to get the page.
The following code should work:
import urllib
class AppURLopener( urllib.FancyURLopener):
def __init__( self, cookie=None, *args):
apply( urllib.FancyURLopener.__init__, (self,) + args)
if cookie:
self.addheader( "Cookie", cookie)
_cookie = "ASPSESSIONIDGGGQGOSO=EEKFJBBCFBDCAFJGPDKMLJDO"
urllib._urlopener = AppURLopener( _cookie)
a=urllib.urlopen("http://boards.gamers.com/messages/overview.asp?name=panther_xl&page=2")
print a.read()
Zugz wrote:
> Earlier I posted "Hi,
>
> I've recently written some Python code to extract some details about posting
> frequency etc from a board I use regularly.
>
> I used IE5.5's Save As to give me some pages to work on offline.
>
> I would now like to automate the whole process by downloading all the
> relevant pages or maybe even just accessing them direct.
>
> If I use urlopen on a regular .htm page, in this case from the collection of
> links I call my www site, then things work as you would expect. You get the
> html source:
>
> >>>a=urllib.urlopen("http://www.zugz.btinternet.co.uk/NonSFBooksBookshops.ht
> m")
> >>> print a.read()
>
> as you would hope.
>
> However if I access one of the pages of interest, which all have the same
> form as below but with the a varying last page number:
>
> >>>a=urllib.urlopen("http://boards.gamers.com/messages/overview.asp?name=pan
> ther_xl&page=2")
> >>> print a.read()
>
> Then you do not get the page source but some HTML about the page being
> moved.
>
> So is this a function of it being an asp page and my luck is out or is there
> a simple way to achieve what I wish anyway.
>
> Thanks in advance for any help you may be able to give."
>
> -------------------
>
> Well "I" solved it and documented the solution here:
> See the solution here:
> http://www.zugz.btinternet.co.uk/python.htm
> Wish I could claim credit though :(
> Zugz.
More information about the Python-list
mailing list