MS Word parser

Tim Golden mail at timgolden.me.uk
Wed Jun 13 04:28:40 EDT 2007


kenicheema at gmail.com wrote:
> Hi all,
> I'm currently using antiword to extract content from MS Word files.
> Is there another way to do this without relying on any command prompt
> application?

Well you haven't given your environment, but is there
anything to stop you from controlling Word itself via
COM? I'm no Word expert, but looking around, this
seems to work:

<code>
import win32com.client
word = win32com.client.Dispatch ("Word.Application")
doc = word.Documents.Open ("c:/temp/temp.doc")
text = doc.Range ().Text

open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8"))
</code>

TJG



More information about the Python-list mailing list