[Tutor] Convert doc to txt on Ubuntu

Carnell, James E jecarnell at saintfrancis.com
Wed Sep 16 21:03:50 CEST 2009


I am needing to access the text in hundreds of Microsoft .doc files on
an Ubuntu OS. I looked at win32 , but only saw support for windows. I am
going through all of these files to create a fairly simple text
delimited file for a spreadsheet.

A) Batch convert to text files so I can access them
B) import some module that allows me to decode this format
C) Open Office allows batch conversion to .odc ,but still don't know how
to access
D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns

Opening .txt documents works fine.

Currently get:

inFile = open("myTestFile.doc", "r")
testRead = inFile.read()

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    test = inFile.read()
  File "/usr/lib/python3.0/io.py", line 1728, in read
    decoder.decode(self.buffer.read(), final=True))
  File "/usr/lib/python3.0/io.py", line 1299, in decode
    output = self.decoder.decode(input, final=final)
  File "/usr/lib/python3.0/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

Any help greatly appreciated Thanks bunches.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090916/ff208907/attachment.htm>


More information about the Tutor mailing list