Parsing HTML with xml.etree in Python 2.7?

Skip Montanaro skip.montanaro at gmail.com
Mon Oct 5 10:14:00 EDT 2015


Back before Fredrik Lundh's elementtree module was sucked into the Python
stdlib as xml.etree, I used to use his elementtidy extension module to
clean up HTML source so it could be parsed into an ElementTree object.
Elementtidy hasn't be updated in about ten years, and still assumes there
is a module named "elementtree" which it can import.

I wouldn't be surprised if there were some small API changes other than the
name change caused by the move into the xml package. Before I dive into a
rabbit hole and start to modify elementtidy, is there some other
stdlib-only way to parse HTML code into an xml.etree.ElementTree?

Thx,

Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20151005/9f91e9ae/attachment.html>


More information about the Python-list mailing list