Any python module for Traversing HTML files

Stefan Behnel stefan.behnel-n05pAM at web.de
Tue Jul 24 14:46:19 EDT 2007


johnny wrote:
> Any python module for navigating and selecting, parsing HTML files?

Since you didn't name any specific requirements, consider taking the best one.
"lxml.html" provides loads of goodies like Python iterators, XPath or CSS
selection for navigation, or a clean() function for removing junk from HTML pages.

The down-side is: there's no official release yet, you currently have to build
it from SVN branch sources. But there will soon be an official alpha release
of lxml 2.0 which will include it.

http://codespeak.net/lxml/

http://codespeak.net/svn/lxml/branch/html/

Stefan



More information about the Python-list mailing list