[Tutor] Retrieving Webpage Source, a Problem with 'onclick'

Liam Clarke cyresse at gmail.com
Sat May 21 23:27:11 CEST 2005


Wow, I was going to give a rundown of how to use COM, but use that PAMIE 
instead to do it, it's way easier.

If you do want to use COM & win32all instead, just bear in mind that the way 
to simulate an OnClick is to use taken from the blog you quoted) - 

ie.Document.frames(1).Document.forms(0).all.item(\"buttonSok\").click()

But yeah, use PAMIE instead.

Regards, 

Liam Clarke

PS Cheers for the link also Kent.

On 5/22/05, Kent Johnson <kent37 at tds.net> wrote:
> 
> You may be interested in Pamie:
> http://pamie.sourceforge.net/
> 
> Kent
> 
> Craig Booth wrote:
> > Hi,
> >
> > I am trying to loop over all of the links in a given webpage and
> > retrieve the source of each of the child pages in turn.
> >
> > My problem is that the links are in the following form:
> >
> > [begin html]
> > <a href="#" onclick="gS(1020,19);return false;" class="ln">link1</a>
> > <a href="#" onclick="gS(1020,8);return false;" class="ln">link2</a>
> > <a href="#" onclick="gS(1020,14);return false;" class="ln">link3</a>
> > <a href="#" onclick="gS(1020,1);return false;" class="ln">link4</a>
> > [end html]
> >
> > So clicking the links appears to call the Javascript function gS to
> > dynamically create pages.
> >
> > I can't figure out how to get urllib/urllib2 to work here as the URL of
> > each of these links is http://www.thehomepage.com/#.
> >
> > I have tried to get mechanize to click each link, once again it doesn't
> > send the onclick request and just goes to http://www.thehomepage.com/#
> >
> > This blog (http://blog.tomtebo.org/programming/lagen.nu_tech_2.html)
> > strongly suggests that the easiest way to do this is to use IE and COM
> > automation (which is fine as I am working on a windows PC) so I have 
> tried
> > importing win32com.client and actually getting IE to click the link:
> >
> > [begin code]
> >
> > ie = Dispatch("InternetExplorer.Application")
> > ie.Visible = 1
> > ie.Navigate('http://www.thehomepage.com')
> >
> > #it takes a little while for page to load
> > if ie.Busy:
> > sleep(2)
> >
> > #Print page title
> > print ie.LocationName
> >
> > test=ie.Document.links
> > ie.Navigate(ie.Document.links(30))
> >
> > [end code]
> >
> > Which should just click the 30th link on the page. As with the other
> > methods this takes me to http://www.thehomepage/# and doesn't call the
> > Javascript.
> >
> > If somebody who has more experience in these matters could suggest a
> > course of action I would be grateful. I'm more than happy to use any
> > method (urllib, mechanize, IE & COM as tried so far) just so long as it
> > works :)
> >
> > Thanks in advance,
> > Craig.
> >
> > _______________________________________________
> > Tutor maillist - Tutor at python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> 
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



-- 
'There is only one basic human right, and that is to do as you damn well 
please.
And with it comes the only basic human duty, to take the consequences.'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050522/81288e64/attachment.htm


More information about the Tutor mailing list