Parsing HTML/XML documents

Thu Apr 26 15:57:47 EDT 2007

Stefan Behnel skrev:
> pabloski at giochinternet.com wrote:
>> I need to parse real world HTML/XML documents and I found two nice python
>> solution: BeautifulSoup and Tidy.
> 
> There's also lxml, in case you want a real XML tool.
> http://codespeak.net/lxml/
> http://codespeak.net/lxml/dev/parsing.html#parsers

I have used both BeautiullSoup and lxml. They are both good tools.

lxml is blindingly fast compared to BeautifulSoup though.

A simple tool for importing contact information from 6000 xml files of 
23 MBytes into Zope runs in about 30 seconds. No optimisations at all. 
Just inefficient xpath expressions.

That is pretty good in my book.

-- 

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science