how to decode rtf characterset ?

M.-A. Lemburg mal at egenix.com
Mon Feb 1 13:11:24 EST 2010


Stef Mientki wrote:
> hello,
> 
> I want to translate rtf files to unicode strings.
> I succeeded in remove all the tags,
> but now I'm stucked to the special accent characters,
> like :
> 
> "Vóór"
> 
> the character "ó" is represented by the string r"\'f3",
> or in bytes: 92, 39,102, 51

> so I think I need a way to translate that into the string r"\xf3"
> but I can't find a way to accomplish that.
> 
> a
> Any suggestions are very welcome.

You could try something along these lines:

>>> s = r"\'f3"
>>> s = s.replace("\\'", "\\x")
>>> u = s.decode('unicode-escape')
>>> u
u'\xf3'

However, this assumes Latin-1 codes being using by the RTF
text.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 01 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-list mailing list