Parse and clean odt docs: with lxml ? hints to start ?

kaer kaerbuhez at gmail.com
Fri Jun 4 03:40:58 EDT 2010


Basically, I have to upgrade a website with a lot of new content. I
received those docs in the openoffice format.  If I open and save one
of those documents in the html format, I can cut and paste the result
in the html page, it's not that bad as a start but I need to clean
that html (remove tags, remove or change attributes, ...). My first
idea is to use lxml for that. My questions:
- is there a better way ?
- is lxml the right tool for that ?
- some examples of code for doing that ?

Have a nice day.



More information about the Python-list mailing list