How to convert .doc file to .txt in Python

Tim Golden mail at timgolden.me.uk
Thu Apr 9 06:53:39 EDT 2015


On 09/04/2015 11:25, subhabrata.banerji at gmail.com wrote:
> Dear Group,
> 
> I was trying to convert .doc file to .txt file.
> 
> I got of python-docx, zipfile but they do not seem to help me much.
> 
> You may kindly suggest how to convert from .doc to
> .docx/.html/.pdf/.rtf as from them I am being able to convert to
> .txt.
> 
> If any one of the Python experts may kindly help me.


There are several approaches, but this one will work (assuming you are
on Windows and have the pywin32 package installed):

<code>
import os
import win32com.client

DOC_FILEPATH = "c:/temp/something.docx"
doc = win32com.client.GetObject(DOC_FILEPATH)
text = doc.Range().Text

#
# do something with the text...
#
with open("something.txt", "wb") as f:
	f.write(text.encode("utf-8"))

os.startfile("something.txt")

</code>

TJG



More information about the Python-list mailing list