Problem Converting Word to UTF8 Text File
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sun Oct 21 13:02:23 EDT 2007
En Sun, 21 Oct 2007 13:35:43 -0300, <patrick.waldo at gmail.com> escribi�:
> Hi all,
>
> I'm trying to copy a bunch of microsoft word documents that have
> unicode characters into utf-8 text files. Everything works fine at
> the beginning. The word documents get converted and new utf-8 text
> files with the same name get created. And then I try to copy the data
> and I keep on getting "TypeError: coercing to Unicode: need string or
> buffer, instance found". I'm probably copying the word document
> wrong. What can I do?
Always remember to provide the full traceback.
Where do you get the error? In the last line: shutil.copyfile?
If the file already contains the text in utf-8, and you just want to make
a copy, use shutil.copy as before.
(or, why not tell Word to save the file using the .txt extension in the
first place?)
> for doc in glob.glob(input):
> txt_split = os.path.splitext(doc)
> txt_doc = txt_split[0] + '.txt'
> txt_doc = codecs.open(txt_doc,'w','utf-8')
> shutil.copyfile(doc,txt_doc)
copyfile expects path names as arguments, not a
codecs-wrapped-file-like-object
--
Gabriel Genellina
More information about the Python-list
mailing list