printing from Word using win32com
Robert Amesz
sheershion at mailexpire.com
Thu Jan 3 20:31:24 EST 2002
Frederick H. Bartlett wrote:
> def simpleSample():
> myWord = Dispatch('Word.Application')
> myWord.Visible = 0
>
> myDoc = myWord.Documents.Add(MYDIR + '\\sampFile.doc')
> something = myWord.ActiveDocument.Paragraphs
> numParas = something.Count
> i = 1
> while i < numParas:
> i = i + 1
> try:
> print something.Item(i).Range().encode('utf-8')
> except UnicodeError:
> print "XXX There was a Unicode Error."
> myWord.Quit()
>
> I think I understand Python objects; I'm sure I don't understand
> Microsoft objects. *sigh* Why should utf-8 work where latin-1
> doesn't?
Because UTF-8 can handle *any* Unicode character you throw at it. But
it probably does *not* do what you expect it to do: any character above
127 is encoded as 2, 3 or 4 bytes with the top bit set. For the IS0-
8859 character sets two bytes will suffise, more exotic character sets
will require three or even four.
If you find pairs of consecutive accented letters in the output that
would be a fairly reliable indication that UTF-8 characters are being
interpreted as native 8-bit characters, as those pairs rarely occur in
normal text.
> I thought M$ used (sort of) latin 1?
It does, at least for the most common locale. It's called CP-1252. But
Win32 is capable of handling Unicode (which they call 'wide
characters'; in C that's wchar_t), so it's not that unlikely that a
character outside the CP-1252 set could get into a Word document,
especially if the document originated from, for instance, a Windows set
to a different locale. (In that case the default Windows character set
could be something like CP-1250, which is eastern european, IIRC.)
Robert Amesz
More information about the Python-list
mailing list