unicode question

Sat Feb 25 03:01:21 EST 2006

Edward Loper <edloper at gradient.cis.upenn.edu> wrote:

>I would like to convert an 8-bit string (i.e., a str) into unicode,
>treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
>into a backslashed escape sequences.  I.e., I want something like this:
>
> >>> decode_with_backslashreplace('abc \xff\xe8 def')
>u'abc \\xff\\xe8 def'
>
>The best I could come up with was:
>
>   def decode_with_backslashreplace(s):
>       "str -> unicode"
>       return (s.decode('latin1')
>                .encode('ascii', 'backslashreplace')
>                .decode('ascii'))
>
>Surely there's a better way than converting back and forth 3 times?

I didn't check whether this was faster, although I rather suspect it is
not:

  cvt = lambda x: ord(x)<0x80 and x or '\\x'+hex(ord(x))
  def decode_with_backslashreplace(s):
      return ''.join(map(cvt,s))
-- 
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.