converting to and from octal escaped UTF--8

Michael Spencer mahs at telcopartners.com
Mon Dec 3 00:46:27 EST 2007


Michael Goerz wrote:
> Hi,
> 
> I am writing unicode stings into a special text file that requires to
> have non-ascii characters as as octal-escaped UTF-8 codes.
> 
> For example, the letter "Í" (latin capital I with acute, code point 205)
> would come out as "\303\215".
> 
> I will also have to read back from the file later on and convert the
> escaped characters back into a unicode string.
> 
> Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> vice versa?
> 
Perhaps something along the lines of:

  >>> def encode(source):
  ...     return "".join("\%o" % ord(c) for c in source.encode('utf8'))
  ...
  >>> def decode(encoded):
  ...     bytes = "".join(chr(int(c, 8)) for c in encoded.split('\\')[1:])
  ...     return bytes.decode('utf8')
  ...
  >>> encode(u"Í")
  '\\303\\215'
  >>> print decode(_)
  Í
  >>>

HTH
Michael




More information about the Python-list mailing list