Any equivalent to Ruby's 'hpricot' html/xpath/css selector package?

Mark Thomas mark at thomaszone.com
Mon Dec 29 07:40:22 EST 2008


On Dec 28, 6:22 pm, Kenneth McDonald
<kenneth.m.mcdon... at sbcglobal.net> wrote:
> Ruby has a package called 'hpricot' which can perform limited xpath  
> queries, and CSS selector queries. However, what makes it really  
> useful is that it does a good job of handling the "broken" html that  
> is so commonly found on the web. Does Python have anything similar,  
> i.e. something that will not only do XPath queries, but will do so on  
> imperfect HTML?

Hpricot is a fine package but I prefer Nokogiri (see
http://www.rubyinside.com/nokogiri-ruby-html-parser-and-xml-parser-1288.html)
because it is based on libxml2 and therefore is faster, conforms to
the full XPath 1.0 spec, works on imperfect HTML, and exposes the
Hpricot API.

In python, the equivalent is lxml (http://codespeak.net/lxml/), which
is similarly based on libxml2, very fast, XPath-1.0 conformant, and
exposes the now-standard ElementTree API.

The main difference is that lxml doesn't have CSS selector syntax, but
IMHO that's a gimmick when you have a full XPath 1.0 engine at your
disposal.

-- Mark.



More information about the Python-list mailing list