Converting .doc to .txt in Linux

Cameron Simpson cs at zip.com.au
Thu Sep 4 23:53:44 EDT 2008


On 04Sep2008 12:54, patrick.waldo at gmail.com <patrick.waldo at gmail.com> wrote:
| I had previously asked a similar question,
| http://groups.google.com/group/comp.lang.python/browse_thread/thread/2953d6d5d8836c4b/9dc901da63d8d059?lnk=gst&q=convert+doc+txt#9dc901da63d8d059
| 
| but at that point I was using Windows and now I am using Linux.
| Basically, I have some .doc files that I need to convert into txt
| files encoded in utf-8.  However, win32com.client doesn't work in
| Linux.

I use the "antiword" or "catdoc" commands to convert .doc to text.
Call them from popen or subprocess from Python, if you must use Python
(I'd just write a shell script for such a task myself unless its embedded
in a larger python context).

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Please do not send me Microsoft Word files.
http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents



More information about the Python-list mailing list