character-filtering and Word (& company)

Tim Churches tchur at optushome.com.au
Fri Mar 25 17:57:45 EST 2005


Charles Hartman wrote:
> I'm working on text-handling programs that want plain-text files as
> input. It's fine to tell users to feed the programs with plain-text
> only, but not all users know what this means, even after you explain it,
> or they forget. So it would be nice to be able to handle gracefully the
> stuff that MS Word (or any word-processor) puts into a file. Inserting a
> 0-127 filter is easy but not very friendly. Typically, the w.p. file
> loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing pane), and
> mostly be readable. Just a few characters will be wrong: "smart"
> quotation marks and the like.
> 
> Is there some well-known way to filter or translate this w.p. garbage? I
> don't know whether encodings are relevant; I don't know what encoding an
> MSW file uses. I don't see how to use s.translate() because I don't know
> how to predict what the incoming format will be.
> 
> Any hints welcome.

Antiword? See http://www.winfield.demon.nl/

OpenOffice driven via PyUNO interface?

Tim C



More information about the Python-list mailing list