character-filtering and Word (& company)

Charles Hartman charles.hartman at conncoll.edu
Fri Mar 25 17:54:05 EST 2005


I'm working on text-handling programs that want plain-text files as 
input. It's fine to tell users to feed the programs with plain-text 
only, but not all users know what this means, even after you explain 
it, or they forget. So it would be nice to be able to handle gracefully 
the stuff that MS Word (or any word-processor) puts into a file. 
Inserting a 0-127 filter is easy but not very friendly. Typically, the 
w.p. file loads OK (into a wx.StyledTextCtrl a.k.a Scintilla editing 
pane), and mostly be readable. Just a few characters will be wrong: 
"smart" quotation marks and the like.

Is there some well-known way to filter or translate this w.p. garbage? 
I don't know whether encodings are relevant; I don't know what encoding 
an MSW file uses. I don't see how to use s.translate() because I don't 
know how to predict what the incoming format will be.

Any hints welcome.

Charles Hartman




More information about the Python-list mailing list