Converting .doc to .txt in Linux

Chris Rebert clp at rebertia.com
Thu Sep 4 16:16:23 EDT 2008


I'd recommend using one of the Word->txt converters for Linux and just
running it in a shell script:
* http://wvware.sourceforge.net/
* http://www.winfield.demon.nl/

No compelling reason to use Python in this instance. Right tool for
the right job and all that.

- Chris

On Thu, Sep 4, 2008 at 12:54 PM,  <patrick.waldo at gmail.com> wrote:
> Hi Everyone,
>
> I had previously asked a similar question,
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/2953d6d5d8836c4b/9dc901da63d8d059?lnk=gst&q=convert+doc+txt#9dc901da63d8d059
>
> but at that point I was using Windows and now I am using Linux.
> Basically, I have some .doc files that I need to convert into txt
> files encoded in utf-8.  However, win32com.client doesn't work in
> Linux.
>
> It's been giving me quite a headache all day.  Any ideas would be
> greatly appreciated.
>
> Best,
> Patrick
>
> #Windows Code:
> import glob,os,codecs,shutil,win32com.client
> from win32com.client import Dispatch
>
> input = '/home/pwaldo2/work/workbench/current_documents/*.doc'
> input_dir = '/home/pwaldo2/work/workbench/current_documents/'
> outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/'
>
> for doc in glob.glob1(input):
>    WordApp = Dispatch("Word.Application")
>    WordApp.Visible = 1
>    WordApp.Documents.Open(doc)
>    WordApp.ActiveDocument.SaveAs(doc,7)
> WordApp.ActiveDocument.Close()
> WordApp.Quit()
>
> for doc in glob.glob(input):
>    txt_split = os.path.splitext(doc)
>    txt_doc = txt_split[0] + '.txt'
>    txt_doc_path = os.path.join(outpath,txt_doc)
>    doc_path = os.path.join(input_dir,doc)
>    shutil.copy(doc_path,txt_doc_path)
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Follow the path of the Iguana...
http://rebertia.com



More information about the Python-list mailing list