[Tutor] Retrieving Webpage Source, a Problem with 'onclick'

Kent Johnson kent37 at tds.net
Sat May 21 16:05:17 CEST 2005


You may be interested in Pamie:
http://pamie.sourceforge.net/

Kent

Craig Booth wrote:
> Hi,
> 
>    I am trying to loop over all of the links in a given webpage and
> retrieve the source of each of the child pages in turn.
> 
>    My problem is that the links are in the following form:
> 
> [begin html]
> <a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a>
> <a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a>
> <a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a>
> <a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a>
> [end html]
> 
>   So clicking the links appears to call the Javascript function gS to
> dynamically create pages.
> 
>   I can't figure out how to get urllib/urllib2 to work here as the URL of
> each of these links is http://www.thehomepage.com/#.
> 
>   I have tried to get mechanize to click each link, once again it doesn't
> send the onclick request and just goes to http://www.thehomepage.com/#
> 
> This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html)
> strongly suggests that the easiest way to do this is to use IE and COM
> automation (which is fine as I am working on a windows PC) so I have tried
> importing win32com.client and actually getting IE to click the link:
> 
> [begin code]
> 
> ie = Dispatch("InternetExplorer.Application")
> ie.Visible = 1
> ie.Navigate('http://www.thehomepage.com')
> 
> #it takes a little while for page to load
> if ie.Busy:
>     sleep(2)
> 
> #Print page title
> print ie.LocationName
> 
> test=ie.Document.links
> ie.Navigate(ie.Document.links(30))
> 
> [end code]
> 
>   Which should just click the 30th link on the page.  As with the other
> methods this takes me to http://www.thehomepage/# and doesn't call the
> Javascript.
> 
>    If somebody who has more experience in these matters could suggest a
> course of action I would be grateful.  I'm more than happy to use any
> method (urllib, mechanize, IE & COM as tried so far) just so long as it
> works :)
> 
>    Thanks in advance,
>       Craig.
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



More information about the Tutor mailing list