[Tutor] Convert doc to txt on Ubuntu
Carnell, James E
jecarnell at saintfrancis.com
Wed Sep 16 21:03:50 CEST 2009
I am needing to access the text in hundreds of Microsoft .doc files on
an Ubuntu OS. I looked at win32 , but only saw support for windows. I am
going through all of these files to create a fairly simple text
delimited file for a spreadsheet.
A) Batch convert to text files so I can access them
B) import some module that allows me to decode this format
C) Open Office allows batch conversion to .odc ,but still don't know how
to access
D) Buy a 24 pack, some Twinkies, and go watch David Hasselhoff reruns
Opening .txt documents works fine.
Currently get:
inFile = open("myTestFile.doc", "r")
testRead = inFile.read()
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
test = inFile.read()
File "/usr/lib/python3.0/io.py", line 1728, in read
decoder.decode(self.buffer.read(), final=True))
File "/usr/lib/python3.0/io.py", line 1299, in decode
output = self.decoder.decode(input, final=final)
File "/usr/lib/python3.0/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data
Any help greatly appreciated Thanks bunches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090916/ff208907/attachment.htm>
More information about the Tutor
mailing list