How to convert .doc file to .txt in Python

subhabrata.banerji at gmail.com subhabrata.banerji at gmail.com
Thu Apr 9 10:40:04 EDT 2015


On Thursday, April 9, 2015 at 4:23:55 PM UTC+5:30, Tim Golden wrote:
> On 09/04/2015 11:25, wrote:
> > Dear Group,
> > 
> > I was trying to convert .doc file to .txt file.
> > 
> > I got of python-docx, zipfile but they do not seem to help me much.
> > 
> > You may kindly suggest how to convert from .doc to
> > .docx/.html/.pdf/.rtf as from them I am being able to convert to
> > .txt.
> > 
> > If any one of the Python experts may kindly help me.
> 
> 
> There are several approaches, but this one will work (assuming you are
> on Windows and have the pywin32 package installed):
> 
> <code>
> import os
> import win32com.client
> 
> DOC_FILEPATH = "c:/temp/something.docx"
> doc = win32com.client.GetObject(DOC_FILEPATH)
> text = doc.Range().Text
> 
> #
> # do something with the text...
> #
> with open("something.txt", "wb") as f:
> 	f.write(text.encode("utf-8"))
> 
> os.startfile("something.txt")
> 
> </code>
> 
> TJG

Thanks Tim it is slightly better than my solution. 



More information about the Python-list mailing list