Analysing Word documents (slow) What's wrong with this code please!

Mon Jan 19 09:34:59 EST 2004

jmdeschamps wrote:
> Anyone has a hint how else to get faster results?
> (This is to find out what was bold in the document, in order to grab
> documents ptoduced in word and generate html (web pages) and xml
> (straight data) versions)
> 
> # START ========================
> import win32com.client
> import tkFileDialog, time
> 
> # Launch Word
> MSWord = win32com.client.Dispatch("Word.Application")
> 
> myWordDoc = tkFileDialog.askopenfilename()
> 
> MSWord.Documents.Open(myWordDoc)
> 
> boldRanges=[]  #list of bold ranges
> boldStart = -1
> boldEnd = -1
> t1= time.clock()
> for i in range(len(MSWord.Documents[0].Content.Text)):
>     if MSWord.Documents[0].Range(i,i+1).Bold  : # testing for bold
> property

Vaguely knowing how pythoncom works, you'd really better avoid asking for 
MSWord.Documents[0] at each loop step: pythoncom will fetch the COM objects 
corresponding to all attributes and methods you ask for dynamically and it may 
cost a lot of time. So doing:

doc = MSWord.Documents[0]
for i in range(len(doc.Content.text)):
   if doc.Range(i,i+1).Bold: ...

may greatly improve performances.

>         if boldStart == -1:
>             boldStart=i
>         else:
>             boldEnd= i
>     else:
>         if boldEnd != -1:
>             boldRanges.append((boldStart,boldEnd))
>             boldStart= -1
>             boldEnd = -1          
> t2 = time.clock()
> MSWord.Quit()
> 
> print boldRanges  #see what we got
> print "Analysed in ",t2-t1
> # END =====================================
> 
> Thanks in advance

-- 
- Eric Brunel <eric dot brunel at pragmadev dot com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com