converting to and from octal escaped UTF--8
MonkeeSage
MonkeeSage at gmail.com
Sun Dec 2 23:01:04 EST 2007
On Dec 2, 8:38 pm, Michael Goerz <answer... at 8439.e4ward.com> wrote:
> Michael Goerz wrote:
> > Hi,
>
> > I am writing unicode stings into a special text file that requires to
> > have non-ascii characters as as octal-escaped UTF-8 codes.
>
> > For example, the letter "Í" (latin capital I with acute, code point 205)
> > would come out as "\303\215".
>
> > I will also have to read back from the file later on and convert the
> > escaped characters back into a unicode string.
>
> > Does anyone have any suggestions on how to go from "Í" to "\303\215" and
> > vice versa?
>
> > I know I can get the code point by doing
> >>>> "Í".decode('utf-8').encode('unicode_escape')
> > but there doesn't seem to be any similar method for getting the octal
> > escaped version.
>
> > Thanks,
> > Michael
>
> I've come up with the following solution. It's not very pretty, but it
> works (no bugs, I hope). Can anyone think of a better way to do it?
>
> Michael
> _________
>
> import binascii
>
> def escape(s):
> hexstring = binascii.b2a_hex(s)
> result = ""
> while len(hexstring) > 0:
> (hexbyte, hexstring) = (hexstring[:2], hexstring[2:])
> octbyte = oct(int(hexbyte, 16)).zfill(3)
> result += "\\" + octbyte[-3:]
> return result
>
> def unescape(s):
> result = ""
> while len(s) > 0:
> if s[0] == "\\":
> (octbyte, s) = (s[1:4], s[4:])
> try:
> result += chr(int(octbyte, 8))
> except ValueError:
> result += "\\"
> s = octbyte + s
> else:
> result += s[0]
> s = s[1:]
> return result
>
> print escape("\303\215")
> print unescape('adf\\303\\215adf')
Looks like escape() can be a bit simpler...
def escape(s):
result = []
for char in s:
result.append("\%o" % ord(char))
return ''.join(result)
Regards,
Jordan
More information about the Python-list
mailing list