[XML-SIG] Advice needed: RTF->XML conversions

Alexandre Fayolle Alexandre.Fayolle@logilab.fr
Thu, 17 May 2001 09:49:08 +0200 (CEST)


On Thu, 17 May 2001, Tony McDonald wrote:

> Can anyone suggest some (preferably python based) tools I can use to get
> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
> to an XML form?
> 
> If someone has written something that takes that (dreadful) 'XML' output
> that Word 2001 outputs and cleans it up into valid XML that would be a great
> start for me.

I don't have a coded solution, but if I were to do such thing, I'd use the
Automation interface of Word together with python's COM interface on
windows to have Word parse the document for me using the various iterators
available in the Word Document interface and building my own XML. 

This can be very simple if your document only uses the basic styles in
word (title 1, text body, toc... [I don't know the english names, only
guessing here]), or dreadful if your document features images, tables,
floating text sections, etc.

Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).