[XML-SIG] Parsing malformed XHTML

Andrew Shearer ashearerw at shearersoftware.com
Tue May 23 22:38:39 CEST 2006


> Lars Kellogg-Stedman wrote:
>
> > I need to parse this document into a DOM, make some changes, and then
> > spit back out the modified file as (X?)HTML (ideally well-formed).  Am
> > I going to be able to do this with PyXML?  If not, I'd love to hear
> > your suggestions for the appropriate tools.
> >
> > Thanks!
> >
> > -- Lars
> You might want to look into Beautiful Soup. Another approach is to pass
> the document through HTML Tidy and then process the output.
>
> Cheers,
> Brian

Another possibility is HTMLFilter. It parses HTML 4 or
backward-compatible XHTML in a way that's more SAX-like than DOM-like,
though you could still use it to build a DOM.  It's well suited for
modifying documents in place, because tags you don't need to modify
can pass straight through without risk of indigestion.

http://www.shearersoftware.com/software/developers/htmlfilter/

-- Andrew


More information about the XML-SIG mailing list