xml.etree - why no HTMLTreeBuilder included?

Jon P. jbperez at gmail.com
Sun Sep 26 17:29:03 EDT 2010


It is great that Fredrik Lundh's ElementTree is now a part of the
Python Standard Library.

However, Is it correct that if you want to use xml.etree.ElementTree
to parse an HTML Document that you will have to install a separate
HTMLTreeBuilder (e.g. TidyHTMLTreeBuilder) and that the only
TreeBuilder objects that come with the Standard Library is the one for
XML source?

Seems like some kind of HTMLTreeBuilder ought to be included by
default.

For a script I'm doing which deals with HTML, I thought I could
jettison lxml and use xml.etree instead, but since I would need to
have to ask the end-user to install an external library anyways even
if I use xml.etree, I switched back to lxml.



More information about the Python-list mailing list