converting to and from octal escaped UTF--8
Michael Goerz
answer654 at 8439.e4ward.com
Sun Dec 2 21:38:16 EST 2007
Michael Goerz wrote:
> Hi,
>
> I am writing unicode stings into a special text file that requires to
> have non-ascii characters as as octal-escaped UTF-8 codes.
>
> For example, the letter "Í" (latin capital I with acute, code point 205)
> would come out as "\303\215".
>
> I will also have to read back from the file later on and convert the
> escaped characters back into a unicode string.
>
> Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> vice versa?
>
> I know I can get the code point by doing
>>>> "Í".decode('utf-8').encode('unicode_escape')
> but there doesn't seem to be any similar method for getting the octal
> escaped version.
>
> Thanks,
> Michael
I've come up with the following solution. It's not very pretty, but it
works (no bugs, I hope). Can anyone think of a better way to do it?
Michael
_________
import binascii
def escape(s):
hexstring = binascii.b2a_hex(s)
result = ""
while len(hexstring) > 0:
(hexbyte, hexstring) = (hexstring[:2], hexstring[2:])
octbyte = oct(int(hexbyte, 16)).zfill(3)
result += "\\" + octbyte[-3:]
return result
def unescape(s):
result = ""
while len(s) > 0:
if s[0] == "\\":
(octbyte, s) = (s[1:4], s[4:])
try:
result += chr(int(octbyte, 8))
except ValueError:
result += "\\"
s = octbyte + s
else:
result += s[0]
s = s[1:]
return result
print escape("\303\215")
print unescape('adf\\303\\215adf')
More information about the Python-list
mailing list