Parsing HTML with JavaScript

Richard Brodie R.Brodie at rl.ac.uk
Fri May 13 04:53:01 EDT 2005


<mtfulmer at tacobell.land> wrote in message news:slrnd88pns.qsm.mtfulmer at tacobell.land...

> I am trying to extract some information from a few web pages, and I was
> using the HTMLParser module. It worked fine until it got to the
> javascript, at which it gave a parse error.

It's fairly common for pages with Javascript to also be invalid HTML.
HTMLParser isn't an 'ignore all errors silently and guess what it's
meant to be' parser. Unless you have known good inputs it's often
best to use an alternative. Some options are discussed in Uche's article
here: http://www.xml.com/pub/a/2004/09/08/pyxml.html





More information about the Python-list mailing list