Getting Properly Encoded Strings from Word into Python

Neil Hodgson nhodgson at bigpond.net.au
Fri Jan 18 16:34:37 EST 2002


Skip Montanaro:

> Isn't that what the encodings directory in the Python distribution is for?
> There are lots of Windows-looking modules there (cp1252.py and so forth).
> If none of them are appropriate, you can probably whip up your own using
the
> VB translation table you mentioned.  I don't know how to use them.  If I
was
> so inclined, I'd start with the docs for the codecs module.

   While you can write a mapping from the Windows symbol character set to
Unicode, that is only a part of Fred's problem which (if I am understanding
correctly) includes retrieving a buffer that is encoded using different
character sets for different segments with no indication of the segments or
their encodings. Just experimented with this and the alpha character gets
pasted into other editors as 'a' :-( . The VBA code mentioned sounds like
one of the better solutions although this should also be possible from
Python over COM.

   Personally, I'd run away. Or save the document as RTF and write an RTF
parser. Or hire Mark Hammond as he is experienced with the horrors of Word.

   Neil






More information about the Python-list mailing list