Create a string array of all comments in a html file...

Stefan Behnel stefan.behnel-n05pAM at web.de
Sat Oct 6 16:29:29 EDT 2007


sophie_newbie wrote:
> Hi, I'm wondering how i'd go about extracting a string array of all
> comments in a HTML file, HTML comments obviously taking the format
> "<!-- Comment text here -->".
> 
> I'm fairly stumped on how to do this? Maybe using regular expressions?


   from lxml import etree

   parser = etree.HTMLParser()
   tree = etree.parse("somefile.html", parser)

   print tree.xpath("//comment()")


http://codespeak.net/lxml

Stefan



More information about the Python-list mailing list