[python-win32] Translating MS-Word documents

Tim Roberts timr at probo.com
Tue Oct 11 18:13:57 CEST 2005


On Tue, 11 Oct 2005 11:32:53 +0200 (CEST), ?yvind 
<python at kapitalisten.no> wrote:

>I need to translate several Word-documents. I have a list with
>approximately 5000 words and its translation, and would like to read thru
>a Word-document, look for the words in the list and replace them. However,
>I need to keep the current formating of the Word-documents. (Using Word
>2003 and XP).
>
>What is the best way of doing this as fast and efficient as possible?
>
>1) Search and replace for each word directly in Word
>
>2) Exctract the text, run it thru regex and thereafter do a search and
>replace in Word.
>
>3) Some other way?
>
>(The only language I know is Python, so writing some C++ stuff that can do
>it a lot faster is not an option).
>  
>

This is a hard problem.

If you can let this run for a number of hours, the simplest answer is to 
use the Word object model to open each file in turn and use the 
Document.Find method to search and replace.  It'll take a while, but the 
computer won't complain.  Here's an MSDN article that shows how to use 
Find and Replace within a selection; the same syntax should work with a 
Document:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_wrcore/html/wrtskhowtoreplacetext.asp

However, in many cases, it is easier to use the Word macro recorder to 
record what you want to do ONCE, and then use the generated VBA to 
create your script.

If your document formatting will survive a change to RTF and back, you 
could convert to RTF (which is easily machine readable) and do the 
replacements in plain text.  However, few documents survive that change 
completely intact.

-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-win32 mailing list