unicode question
Tim Roberts
timr at probo.com
Sat Feb 25 03:01:21 EST 2006
Edward Loper <edloper at gradient.cis.upenn.edu> wrote:
>I would like to convert an 8-bit string (i.e., a str) into unicode,
>treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
>into a backslashed escape sequences. I.e., I want something like this:
>
> >>> decode_with_backslashreplace('abc \xff\xe8 def')
>u'abc \\xff\\xe8 def'
>
>The best I could come up with was:
>
> def decode_with_backslashreplace(s):
> "str -> unicode"
> return (s.decode('latin1')
> .encode('ascii', 'backslashreplace')
> .decode('ascii'))
>
>Surely there's a better way than converting back and forth 3 times?
I didn't check whether this was faster, although I rather suspect it is
not:
cvt = lambda x: ord(x)<0x80 and x or '\\x'+hex(ord(x))
def decode_with_backslashreplace(s):
return ''.join(map(cvt,s))
--
- Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list