converting to and from octal escaped UTF--8

Sun Dec 2 23:01:04 EST 2007

On Dec 2, 8:38 pm, Michael Goerz <answer... at 8439.e4ward.com> wrote:
> Michael Goerz wrote:
> > Hi,
>
> > I am writing unicode stings into a special text file that requires to
> > have non-ascii characters as as octal-escaped UTF-8 codes.
>
> > For example, the letter "Í" (latin capital I with acute, code point 205)
> > would come out as "\303\215".
>
> > I will also have to read back from the file later on and convert the
> > escaped characters back into a unicode string.
>
> > Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> > vice versa?
>
> > I know I can get the code point by doing
> >>>> "Í".decode('utf-8').encode('unicode_escape')
> > but there doesn't seem to be any similar method for getting the octal
> > escaped version.
>
> > Thanks,
> > Michael
>
> I've come up with the following solution. It's not very pretty, but it
> works (no bugs, I hope). Can anyone think of a better way to do it?
>
> Michael
> _________
>
> import binascii
>
> def escape(s):
>     hexstring = binascii.b2a_hex(s)
>     result = ""
>     while len(hexstring) > 0:
>         (hexbyte, hexstring) = (hexstring[:2], hexstring[2:])
>         octbyte = oct(int(hexbyte, 16)).zfill(3)
>         result += "\\" + octbyte[-3:]
>     return result
>
> def unescape(s):
>     result = ""
>     while len(s) > 0:
>         if s[0] == "\\":
>             (octbyte, s) = (s[1:4], s[4:])
>             try:
>                 result += chr(int(octbyte, 8))
>             except ValueError:
>                 result += "\\"
>                 s = octbyte + s
>         else:
>             result += s[0]
>             s = s[1:]
>     return result
>
> print escape("\303\215")
> print unescape('adf\\303\\215adf')

Looks like escape() can be a bit simpler...

def escape(s):
  result = []
  for char in s:
    result.append("\%o" % ord(char))
  return ''.join(result)

Regards,
Jordan