[python-win32] UnicodeEncodingError when print a doc file

Tim Roberts timr at probo.com
Tue Jun 14 19:36:01 CEST 2011


cool_go_blue wrote:
> I try to read a word document as follows:
>
> app = win32com.client.Dispatch('Word.Application')
> doc = app.Documents.Open('D:\myfile.doc')
> print doc.Content.Text
>
> I receive the following error:
>
> raceback (most recent call last):
>   File "D:\projects\Myself\MySVD\src\ReadWord.py", line 11, in <module>
>     print doc.Content.Text
>   File "D:\Softwares\Python27\lib\encodings\cp1252.py", line 12, in encode
>     return codecs.charmap_encode(input,errors,encoding_table)
> UnicodeEncodeError: 'charmap' codec can't encode character u'\uf06d'
> in position 4397: character maps to <undefined>
>

You are reading the Word document just fine.  The issue is printing it
to your terminal.  The document contains Unicode characters that aren't
present in your terminal's font.  You need to tell it how to handle the
conversion from Unicode to 8-bit.  Try this:

    print doc.Content.Text.encode('cp1252','replace')

That will print ? where invalid characters are found.

U+F06D is not a valid character.  It's in the "private use" area, so
it's possible this is some special code to Word.

-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the python-win32 mailing list