Parsing HTML/XML documents
Max M
maxm at mxm.dk
Thu Apr 26 15:57:47 EDT 2007
Stefan Behnel skrev:
> pabloski at giochinternet.com wrote:
>> I need to parse real world HTML/XML documents and I found two nice python
>> solution: BeautifulSoup and Tidy.
>
> There's also lxml, in case you want a real XML tool.
> http://codespeak.net/lxml/
> http://codespeak.net/lxml/dev/parsing.html#parsers
I have used both BeautiullSoup and lxml. They are both good tools.
lxml is blindingly fast compared to BeautifulSoup though.
A simple tool for importing contact information from 6000 xml files of
23 MBytes into Zope runs in about 30 seconds. No optimisations at all.
Just inefficient xpath expressions.
That is pretty good in my book.
--
hilsen/regards Max M, Denmark
http://www.mxm.dk/
IT's Mad Science
More information about the Python-list
mailing list