Replacement in unicodestrings?

"Martin v. Löwis" martin at v.loewis.de
Sun Oct 5 01:31:22 EDT 2008


>         s_str=repr(s.encode('UTF-8'))

It would be easier to encode this in cp1252 here, as this is apparently
the encoding that you want to use in the RTF file, too. You could then
loop over the string, replacing all bytes >= 128 with \\'%.2x

As yet another alternative, you could create a Unicode error handler
(call it 'rtf'), and then do

          return s.encode('ascii', errors='rtf')

>         replDic={'\xc3\xa0':"\\'e0",'\xc3\xa4':"\\'e4",'\xc3\xa1':"\
> \'e1",
>                 '\xc3\xa8':"\\'e8",'\xc3\xab':"\\'eb",'\xc3\xa9':"\
> \'e9",
>                 '\xc3\xb2':"\\'f2",'\xc3\xb6':"\\'f6",'\xc3\xb3':"\
> \'f3",
>                 '\xe2\x82\xac':"\\'80"}
>         for k in replDic.keys():
>             if repr(k) in s_str:
>                 s_str=s_str.replace(repr(k),replDic[k])
>         return s_str
> 
> However interactive:
> 
>>>> '\xc3\xab' in 'Arj\xc3\xabn'
> True
> 
> I just don't get it, what's the difference?

It's the repr():

py> '\xc3\xab' in 'Arj\xc3\xabn'
True
py> repr('\xc3\xab') in repr('Arj\xc3\xabn')
False
py> repr('\xc3\xab')
"'\\xc3\\xab'"
py> repr('Arj\xc3\xabn')
"'Arj\\xc3\\xabn'"

repr('\xc3\xab') starts with an apostrophe, which doesn't
appear before the \\xc3 in repr('Arj\xc3\xabn').

HTH,
Martin



More information about the Python-list mailing list