[python-win32] Translating MS-Word documents
Tim Roberts
timr at probo.com
Tue Oct 11 18:13:57 CEST 2005
On Tue, 11 Oct 2005 11:32:53 +0200 (CEST), ?yvind
<python at kapitalisten.no> wrote:
>I need to translate several Word-documents. I have a list with
>approximately 5000 words and its translation, and would like to read thru
>a Word-document, look for the words in the list and replace them. However,
>I need to keep the current formating of the Word-documents. (Using Word
>2003 and XP).
>
>What is the best way of doing this as fast and efficient as possible?
>
>1) Search and replace for each word directly in Word
>
>2) Exctract the text, run it thru regex and thereafter do a search and
>replace in Word.
>
>3) Some other way?
>
>(The only language I know is Python, so writing some C++ stuff that can do
>it a lot faster is not an option).
>
>
This is a hard problem.
If you can let this run for a number of hours, the simplest answer is to
use the Word object model to open each file in turn and use the
Document.Find method to search and replace. It'll take a while, but the
computer won't complain. Here's an MSDN article that shows how to use
Find and Replace within a selection; the same syntax should work with a
Document:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_wrcore/html/wrtskhowtoreplacetext.asp
However, in many cases, it is easier to use the Word macro recorder to
record what you want to do ONCE, and then use the generated VBA to
create your script.
If your document formatting will survive a change to RTF and back, you
could convert to RTF (which is easily machine readable) and do the
replacements in plain text. However, few documents survive that change
completely intact.
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-win32
mailing list