[Ncr-Python.in] Executing js/ajax in a sandboxed environment

Noufal Ibrahim noufal at gmail.com
Sun Feb 27 06:06:32 CET 2011


On Sun, Feb 27 2011, Rohan Malhotra wrote:

> I need to parse a url for it's content in python. I want to execute
> all js of html page associated with the url and then parse for the
> content so that javascript induced changes in content are also
> present. I was wondering if there is a way to execute js associated in
> page in sandbox environment before I start parsing it.
>
> BeautifulSoup library only fetches source of page. I need the access
> to DOM after js execution with url as input parameter.

Maybe you can fetch the page (and parse it using BeautifulSoup or
something like that) and then use python-spidermoney to execute the
javascript. 
http://code.google.com/p/python-spidermonkey/

I've not done this myself so I don't know how effective it will be. 

If it's an option to have a browser running where your script is
executing, you might be able to ask Firefox to do all the Javascript
work and then fetch the rendered page using MozRepl
(https://github.com/bard/mozrepl/wiki)



-- 
~noufal
http://nibrahim.net.in


More information about the Ncr-Python.in mailing list