converting to and from octal escaped UTF--8
Michael Spencer
mahs at telcopartners.com
Mon Dec 3 00:46:27 EST 2007
Michael Goerz wrote:
> Hi,
>
> I am writing unicode stings into a special text file that requires to
> have non-ascii characters as as octal-escaped UTF-8 codes.
>
> For example, the letter "Í" (latin capital I with acute, code point 205)
> would come out as "\303\215".
>
> I will also have to read back from the file later on and convert the
> escaped characters back into a unicode string.
>
> Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> vice versa?
>
Perhaps something along the lines of:
>>> def encode(source):
... return "".join("\%o" % ord(c) for c in source.encode('utf8'))
...
>>> def decode(encoded):
... bytes = "".join(chr(int(c, 8)) for c in encoded.split('\\')[1:])
... return bytes.decode('utf8')
...
>>> encode(u"Í")
'\\303\\215'
>>> print decode(_)
Í
>>>
HTH
Michael
More information about the Python-list
mailing list