Opening MS Word files via Python

jmdeschamps jmdeschamps at cvm.qc.ca
Wed Apr 21 09:36:24 EDT 2004


Rob Nikander <rnikaREMOVEnder at adelphia.net> wrote in message news:<i7-dnZNwpJ8TfhjdRVn-jg at adelphia.com>...
> Fazer wrote:
> > I am curious as to how I should approach this issue.  I would just
> > want to parse simple text and maybe perhaps tables in the future. 
> > Would I have to save the word file and open it in a text editor?  That
> > would kind of....suck...  Has anyone else tackled this issue?
> 
> The win32 extensions for python allow you to get at the COM objects for 
> applications like Word, and that would let you get the text and tables. 
>   google: win32 python.
> 
> word = win32com.client.Dispatch('Word.Application')
> word.Documents.Open('C:\\myfile.doc')
> 
> But I don't know the best way to find out the methods and properties of 
> the "word" object.
> 
> Rob

You can use VBA documentation for Word, and using dot notation and
normal Pythonesque way of calling functions, play with its diverses
objects, methods and attributes...
Here's some pretty straightforward code along these lines:
#************************
import win32com.client
import tkFileDialog

# Launch Word
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0 
# Open a specific file
myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open(myWordDoc)
#Get the textual content
docText = MSWord.Documents[0].Content
# Get a list of tables
listTables= MSWord.Documents[0].Tables
#************************

Happy parsing,

Jean-Marc



More information about the Python-list mailing list