read web page that requires javascript on client

Jeremiah Dodds jeremiah.dodds at gmail.com
Thu Mar 19 06:38:33 EDT 2009


On Thu, Mar 19, 2009 at 1:25 AM, Carl <tg2.user at gmail.com> wrote:

> Probably the easiest thing is to actually use a browser. There are
> many examples of automating a browser via Python. So, you can
> programmatically launch the browser, point it to the JavaScript
> afflicted page, let the JS run and grab the page source. As an added
> bonus you can later interact with the page by programatically, filling
> form fields, selecting options from lists and clicking buttons.
>
> HTH, Carl
> --
> http://mail.python.org/mailman/listinfo/python-list
>


I've been using the python port of mechanize (specifically
mechanize.browser) for web automation, it's rather nice.[1]

In the vast majority of cases, if I need something done with javascript,
it's just to generate a url or post data - in which case I can just read the
javascript, and figure out what it's doing - or use something like
livehttpheaders[2] to find out what I need to be sending.

This obviously doesn't cover every single case, but it works pretty fine
most of the time. I would love to see a complete javascript interpreter /
dom interface in python - but it's hard, hard stuff.

I'd love to even have the time to read the ECMAScript spec enough to have
enough working knowledge of javascript's internals to be able to contribute
to one of the current attempts at doing so.

1. http://wwwsearch.sourceforge.net/mechanize/
2. https://addons.mozilla.org/en-US/firefox/addon/3829
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090319/0d4cb82e/attachment-0001.html>


More information about the Python-list mailing list