converting to and from octal escaped UTF--8

Michael Goerz answer654 at 8439.e4ward.com
Sun Dec 2 21:38:16 EST 2007


Michael Goerz wrote:
> Hi,
> 
> I am writing unicode stings into a special text file that requires to
> have non-ascii characters as as octal-escaped UTF-8 codes.
> 
> For example, the letter "Í" (latin capital I with acute, code point 205)
> would come out as "\303\215".
> 
> I will also have to read back from the file later on and convert the
> escaped characters back into a unicode string.
> 
> Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> vice versa?
> 
> I know I can get the code point by doing
>>>> "Í".decode('utf-8').encode('unicode_escape')
> but there doesn't seem to be any similar method for getting the octal
> escaped version.
> 
> Thanks,
> Michael

I've come up with the following solution. It's not very pretty, but it
works (no bugs, I hope). Can anyone think of a better way to do it?

Michael
_________

import binascii

def escape(s):
    hexstring = binascii.b2a_hex(s)
    result = ""
    while len(hexstring) > 0:
        (hexbyte, hexstring) = (hexstring[:2], hexstring[2:])
        octbyte = oct(int(hexbyte, 16)).zfill(3)
        result += "\\" + octbyte[-3:]
    return result

def unescape(s):
    result = ""
    while len(s) > 0:
        if s[0] == "\\":
            (octbyte, s) = (s[1:4], s[4:])
            try:
                result += chr(int(octbyte, 8))
            except ValueError:
                result += "\\"
                s = octbyte + s
        else:
            result += s[0]
            s = s[1:]
    return result

print escape("\303\215")
print unescape('adf\\303\\215adf')



More information about the Python-list mailing list