[python-win32] Translating MS-Word documents

bob bgailer at alum.rpi.edu
Wed Oct 12 19:28:29 CEST 2005


At 04:28 AM 10/12/2005, Øyvind wrote:
>Hello, and thank you for the answers so far.
>
>The documents can be huge, as in 4-5000 pages, and there are upto 7*5000
>words that needs to be replaced. (It is as you have pointed out a
>translation of languages, but for a very speicalised branch of patents.
>Therefore there are no translaters that know most of these words. They
>will be changed, and thereafter, someone might spend a few weeks/months
>getting it correct.)
>
>I don't really think changing it to rtf is a sollution. The formating is
>very important.

You don't lose formatting with rtf.

What did you not like about my proposal to run the words collections thru a 
dictionary.

For what its worth I created a 405 word dictionary, and ran a document of 
1000 words thru it. Every word in the document was in the dictionary. On my 
machine which is at least 3 years old (1 ghz cpu I think) it took 16 seconds.

Heres the code if you want to experiment with it. I still think it is the 
fastest solution. But YMMV.

import win32com.client
import time
from translations import t # I assume you have a dictionary of words & 
translations named t
w=win32com.client.Dispatch('word.application')
d=w.documents.Open('c:/foo.doc')
def main():
         s = time.clock()
         wds = d.words.count+1
         for i in range(1,d.wds):
                 word = d.words(i)
                 try: word.text = t[word .text]
                 except:pass
         print time.clock() - s
main()

The time is proportional to the # of words in the document. The size of the 
dictionary should not radically affect the time. The reason for the try is 
that some words in my sample document were in links, and reassigning the 
text failed. This also supports the cases where the word is not a 
dictionary key.

Most of the time is spent looping and accessing the words. About 1% looking 
in the dictionary. The rest in reassigning the text.

>The company do have Word Macros today that do the job. But, as you might
>imagine, it is very hard to maintain and got lots of 'issues'. It started
>out with a few words 6-7 years ago, and have grown.
>
>Will there be a increase in speed if I pull out all the text, run it thru
>regex and thereafter do a Word Search and Replace of those words that
>Regex finds, instead of doing a complete Search and Replace in Word?
>
>Thanks in advance.
>
>
>
>--
>This email has been scanned for viruses & spam by Decna as - www.decna.no
>Denne e-posten er sjekket for virus & spam av Decna as - www.decna.no
>
>_______________________________________________
>Python-win32 mailing list
>Python-win32 at python.org
>http://mail.python.org/mailman/listinfo/python-win32



More information about the Python-win32 mailing list