[XML-SIG] xml.dom.ext.reader.HtmlLib memory leak?

Daniel Veillard veillard at redhat.com
Wed Sep 1 11:32:17 CEST 2004


On Fri, Aug 27, 2004 at 07:52:16PM +0200, Walter Dörwald wrote:
> 
> This looks great. When I dump the DOM again, the resulting
> files look much better then those generated by HTMLParser
> from the standard library or my own HTML parser.

  Okay, don't forget to free the documents when you don't need them
anymore.

> BTW, I wonder why libxml2 complains about the following:
> 
> >>> doc = libxml2.htmlParseFile("http://www.python.org", None)
> http://www.python.org:3: HTML parser error : htmlParseStartTag: invalid 
> element name
> <?xml-stylesheet href="./css/ht2html.css" type="text/css"?>

  Seems the HTML parser has no notion of Processing Instruction,
apparently a bug, c.f.:
  http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.3.6

bug registered
  http://bugzilla.gnome.org/show_bug.cgi?id=151584

> I think the next version of XIST will use libxml2 instead
> of uTidyLib for parsing HTML.

 Cool :-)

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard at redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


More information about the XML-SIG mailing list