Analysing Word documents (slow) What's wrong with this code please!
Eric Brunel
eric.brunel at N0SP4M.com
Mon Jan 19 09:34:59 EST 2004
jmdeschamps wrote:
> Anyone has a hint how else to get faster results?
> (This is to find out what was bold in the document, in order to grab
> documents ptoduced in word and generate html (web pages) and xml
> (straight data) versions)
>
> # START ========================
> import win32com.client
> import tkFileDialog, time
>
> # Launch Word
> MSWord = win32com.client.Dispatch("Word.Application")
>
> myWordDoc = tkFileDialog.askopenfilename()
>
> MSWord.Documents.Open(myWordDoc)
>
> boldRanges=[] #list of bold ranges
> boldStart = -1
> boldEnd = -1
> t1= time.clock()
> for i in range(len(MSWord.Documents[0].Content.Text)):
> if MSWord.Documents[0].Range(i,i+1).Bold : # testing for bold
> property
Vaguely knowing how pythoncom works, you'd really better avoid asking for
MSWord.Documents[0] at each loop step: pythoncom will fetch the COM objects
corresponding to all attributes and methods you ask for dynamically and it may
cost a lot of time. So doing:
doc = MSWord.Documents[0]
for i in range(len(doc.Content.text)):
if doc.Range(i,i+1).Bold: ...
may greatly improve performances.
> if boldStart == -1:
> boldStart=i
> else:
> boldEnd= i
> else:
> if boldEnd != -1:
> boldRanges.append((boldStart,boldEnd))
> boldStart= -1
> boldEnd = -1
> t2 = time.clock()
> MSWord.Quit()
>
> print boldRanges #see what we got
> print "Analysed in ",t2-t1
> # END =====================================
>
> Thanks in advance
--
- Eric Brunel <eric dot brunel at pragmadev dot com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
More information about the Python-list
mailing list