printing from Word using win32com

Thu Jan 3 20:31:24 EST 2002

Frederick H. Bartlett wrote:

> def simpleSample():
>     myWord = Dispatch('Word.Application')
>     myWord.Visible = 0
> 
>     myDoc = myWord.Documents.Add(MYDIR + '\\sampFile.doc')
>     something = myWord.ActiveDocument.Paragraphs
>     numParas = something.Count
>     i = 1
>     while i < numParas:
>       i = i + 1
>       try:
>         print something.Item(i).Range().encode('utf-8')
>       except UnicodeError:
>         print "XXX There was a Unicode Error."
>     myWord.Quit()
> 
> I think I understand Python objects; I'm sure I don't understand
> Microsoft objects. *sigh* Why should utf-8 work where latin-1
> doesn't?

Because UTF-8 can handle *any* Unicode character you throw at it. But 
it probably does *not* do what you expect it to do: any character above 
127 is encoded as 2, 3 or 4 bytes with the top bit set. For the IS0-
8859 character sets two bytes will suffise, more exotic character sets 
will require three or even four.

If you find pairs of consecutive accented letters in the output that 
would be a fairly reliable indication that UTF-8 characters are being 
interpreted as native 8-bit characters, as those pairs rarely occur in 
normal text.

> I thought M$ used (sort of) latin 1?

It does, at least for the most common locale. It's called CP-1252. But 
Win32 is capable of handling Unicode (which they call 'wide 
characters'; in C that's wchar_t), so it's not that unlikely that a 
character outside the CP-1252 set could get into a Word document, 
especially if the document originated from, for instance, a Windows set 
to a different locale. (In that case the default Windows character set 
could be something like CP-1250, which is eastern european, IIRC.)

Robert Amesz