HTML parsing confusion

Wed Jan 23 10:33:43 EST 2008

On Jan 23, 2008 7:40 AM, Alnilam <alnilam at gmail.com> wrote:
> Skipping past html validation, and html to xhtml 'cleaning', and
> instead starting with the assumption that I have files that are valid
> XHTML, can anyone give me a good example of how I would use _ htmllib,
> HTMLParser, or ElementTree _ to parse out the text of one specific
> childNode, similar to the examples that I provided above using regex?

Have you looked at any of the tutorials or sample code for these
libraries?  If you had a specific question, you will probably get more
specific help.  I started writing up some sample code, but realized I
was mostly reprising the long tutorial on SGMLLib here:
http://www.boddie.org.uk/python/HTML.html

-- 
Jerry