[XML-SIG] How does one process HTML with the DOM support in PyXML?

Fred L. Drake, Jr. fdrake@acm.org
Tue, 12 Jun 2001 22:52:35 -0400 (EDT)


Bill Janssen writes:
 > I've been looking at the PyXML docs, to see how whether it could be
 > used to parse HTML files.  There seems to be something interesting

Bill,
  You don't say much about what you're interested in doing with the
HTML, and whether you need to be able HTML "as deployed" or valid
stuff.  Also, what about XHTML?
  If HTML as deployed and XHTML are relevant, you may want to look at
the HTMLParser module added to the standard library for Python 2.2.
That can certainly be extracted from the CVS if it looks interesting.
Basic documentation is available at:

    http://python.sourceforge.net/devel-docs/lib/module-HTMLParser.html

  It needs a little more high-level description, though!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations