Screenscraping, in python, a web page that requires javascript?

John J. Lee jjl at pobox.com
Thu Aug 9 17:41:32 EDT 2007


Dan Stromberg - Datallegro <dstromberg at datallegro.com> writes:

> Is there a method, with python, of screenscraping a web page, if that web
> page uses javascript?

Not pure CPython, no.


> I know about BeautifulSoup, but AFAIK at this time, BeautifulSoup is for
> HTML that doesn't have embedded javascript.

It's not that BeautifulSoup is unhappy with JS, it's just that there's
no support for executing the JS.

There are some Java libraries that know how to execute JS embedded in
web pages, which could be used from Jython:

http://www.thefrontside.net/crosscheck

http://htmlunit.sourceforge.net/

http://httpunit.sourceforge.net/


You can also automate a browser, but that still seems to be painful in
one way or another.


John



More information about the Python-list mailing list