Problem Converting Word to UTF8 Text File

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Oct 21 13:02:23 EDT 2007


En Sun, 21 Oct 2007 13:35:43 -0300, <patrick.waldo at gmail.com> escribi�:

> Hi all,
>
> I'm trying to copy a bunch of microsoft word documents that have
> unicode characters into utf-8 text files.  Everything works fine at
> the beginning.  The word documents get converted and new utf-8 text
> files with the same name get created.  And then I try to copy the data
> and I keep on getting "TypeError: coercing to Unicode: need string or
> buffer, instance found".  I'm probably copying the word document
> wrong.  What can I do?

Always remember to provide the full traceback.
Where do you get the error? In the last line: shutil.copyfile?
If the file already contains the text in utf-8, and you just want to make  
a copy, use shutil.copy as before.
(or, why not tell Word to save the file using the .txt extension in the  
first place?)

> for doc in glob.glob(input):
>     txt_split = os.path.splitext(doc)
>     txt_doc = txt_split[0] + '.txt'
>     txt_doc = codecs.open(txt_doc,'w','utf-8')
>     shutil.copyfile(doc,txt_doc)

copyfile expects path names as arguments, not a  
codecs-wrapped-file-like-object

-- 
Gabriel Genellina




More information about the Python-list mailing list