[Ncr-Python.in] Executing js/ajax in a sandboxed environment
Noufal Ibrahim
noufal at gmail.com
Sun Feb 27 06:06:32 CET 2011
On Sun, Feb 27 2011, Rohan Malhotra wrote:
> I need to parse a url for it's content in python. I want to execute
> all js of html page associated with the url and then parse for the
> content so that javascript induced changes in content are also
> present. I was wondering if there is a way to execute js associated in
> page in sandbox environment before I start parsing it.
>
> BeautifulSoup library only fetches source of page. I need the access
> to DOM after js execution with url as input parameter.
Maybe you can fetch the page (and parse it using BeautifulSoup or
something like that) and then use python-spidermoney to execute the
javascript.
http://code.google.com/p/python-spidermonkey/
I've not done this myself so I don't know how effective it will be.
If it's an option to have a browser running where your script is
executing, you might be able to ask Firefox to do all the Javascript
work and then fetch the rendered page using MozRepl
(https://github.com/bard/mozrepl/wiki)
--
~noufal
http://nibrahim.net.in
More information about the Ncr-Python.in
mailing list