[XML-SIG] Advice needed: RTF->XML conversions

Tony McDonald tony.mcdonald@ncl.ac.uk
Thu, 17 May 2001 09:14:34 +0100


On 17/5/01 8:49 am, "Alexandre Fayolle" <Alexandre.Fayolle@logilab.fr>
wrote:

> On Thu, 17 May 2001, Tony McDonald wrote:
> 
>> Can anyone suggest some (preferably python based) tools I can use to get
>> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
>> to an XML form?
>> 
>> If someone has written something that takes that (dreadful) 'XML' output
>> that Word 2001 outputs and cleans it up into valid XML that would be a great
>> start for me.
> 
> I don't have a coded solution, but if I were to do such thing, I'd use the
> Automation interface of Word together with python's COM interface on
> windows to have Word parse the document for me using the various iterators
> available in the Word Document interface and building my own XML.
> 

We have very little experience of doing things this way - we're a Unix and
Zope shop and try not to get too involved with the inner workings of
Microsoft software (if at all possible).

> This can be very simple if your document only uses the basic styles in
> word (title 1, text body, toc... [I don't know the english names, only
> guessing here]), or dreadful if your document features images, tables,
> floating text sections, etc.
> 
> Alexandre Fayolle

Thanks for the advice Alexandre, but it's the latter case I'm afraid :(

Our documents have tables, images, superscripts/subscripts, greek characters
(ie simple formulas), page breaks and more besides.

Cheers
Tone.
-- 
Dr Tony McDonald,  Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE  http://www.fmcc.org.uk/mailman/listinfo/zope