[Tutor] Retrieving Webpage Source, a Problem with 'onclick'
Kent Johnson
kent37 at tds.net
Sat May 21 16:05:17 CEST 2005
You may be interested in Pamie:
http://pamie.sourceforge.net/
Kent
Craig Booth wrote:
> Hi,
>
> I am trying to loop over all of the links in a given webpage and
> retrieve the source of each of the child pages in turn.
>
> My problem is that the links are in the following form:
>
> [begin html]
> <a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a>
> <a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a>
> <a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a>
> <a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a>
> [end html]
>
> So clicking the links appears to call the Javascript function gS to
> dynamically create pages.
>
> I can't figure out how to get urllib/urllib2 to work here as the URL of
> each of these links is http://www.thehomepage.com/#.
>
> I have tried to get mechanize to click each link, once again it doesn't
> send the onclick request and just goes to http://www.thehomepage.com/#
>
> This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html)
> strongly suggests that the easiest way to do this is to use IE and COM
> automation (which is fine as I am working on a windows PC) so I have tried
> importing win32com.client and actually getting IE to click the link:
>
> [begin code]
>
> ie = Dispatch("InternetExplorer.Application")
> ie.Visible = 1
> ie.Navigate('http://www.thehomepage.com')
>
> #it takes a little while for page to load
> if ie.Busy:
> sleep(2)
>
> #Print page title
> print ie.LocationName
>
> test=ie.Document.links
> ie.Navigate(ie.Document.links(30))
>
> [end code]
>
> Which should just click the 30th link on the page. As with the other
> methods this takes me to http://www.thehomepage/# and doesn't call the
> Javascript.
>
> If somebody who has more experience in these matters could suggest a
> course of action I would be grateful. I'm more than happy to use any
> method (urllib, mechanize, IE & COM as tried so far) just so long as it
> works :)
>
> Thanks in advance,
> Craig.
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list