Converting .doc to .txt in Linux

patrick.waldo at gmail.com patrick.waldo at gmail.com
Thu Sep 4 15:54:38 EDT 2008


Hi Everyone,

I had previously asked a similar question,
http://groups.google.com/group/comp.lang.python/browse_thread/thread/2953d6d5d8836c4b/9dc901da63d8d059?lnk=gst&q=convert+doc+txt#9dc901da63d8d059

but at that point I was using Windows and now I am using Linux.
Basically, I have some .doc files that I need to convert into txt
files encoded in utf-8.  However, win32com.client doesn't work in
Linux.

It's been giving me quite a headache all day.  Any ideas would be
greatly appreciated.

Best,
Patrick

#Windows Code:
import glob,os,codecs,shutil,win32com.client
from win32com.client import Dispatch

input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
input_dir = '/home/pwaldo2/work/workbench/current_documents/'
outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'

for doc in glob.glob1(input):
    WordApp = Dispatch("Word.Application")
    WordApp.Visible = 1
    WordApp.Documents.Open(doc)
    WordApp.ActiveDocument.SaveAs(doc,7)
WordApp.ActiveDocument.Close()
WordApp.Quit()

for doc in glob.glob(input):
    txt_split = os.path.splitext(doc)
    txt_doc = txt_split[0] + '.txt'
    txt_doc_path = os.path.join(outpath,txt_doc)
    doc_path = os.path.join(input_dir,doc)
    shutil.copy(doc_path,txt_doc_path)



More information about the Python-list mailing list