Parsing HTML with JavaScript

John J. Lee jjl at pobox.com
Fri May 13 16:29:19 EDT 2005


mtfulmer at tacobell.land writes:

> I am trying to extract some information from a few web pages, and I was
> using the HTMLParser module. It worked fine until it got to the
> javascript, at which it gave a parse error. Is there a good way to work
> around this or should I just preparse the file to remove the javascript
> manually? This is my first python program. 

sgmllib is very similar to HTMLParser, but doesn't break so easily
(but sgmllib has some problems with XHTML -- swings and roundabouts).

Or, try BeautifulSoup.


John



More information about the Python-list mailing list