How to Encode String of Raw UTF-8 into Unicode?

Henry Chang goldspin at gmail.com
Thu Mar 6 20:00:00 EST 2008


Awesome, that works.  Thank you so much!  My confusion of the
different format made this harder than it should.

On Thu, Mar 6, 2008 at 4:53 PM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar> wrote:
> En Thu, 06 Mar 2008 22:43:58 -0200, Henry Chang <goldspin at gmail.com>
>  escribi�:
>
>
>  > Suppose I start out with a raw string of utf-8 code points.
>
>  "utf-8 code points"???
>  Looks like a utf-8 encoded string, and then written in hex format.
>
>
>  >   raw_string = "68656E727963"
>  >
>  > I can coerce it into proper unicode format by slicing out two
>  > characters at a time.
>  >
>  >   unicode_string = u"\x68\x65\x6E\x72\x79\x63"
>  >
>  >   >>>  print unicode_proper
>  >   >>> henry
>  >
>  > My question: is there an existing function that can do this (without
>  > having to manually slicing the raw text string)?
>
>  Two steps: first decode from hex to string, and then from utf8 string to
>  unicode:
>
>  py> raw_string = "68656E727963"
>  py> raw_string.decode("hex")
>  'henryc'
>  py> raw_string.decode("hex").decode("utf8")
>  u'henryc'
>
>  --
>  Gabriel Genellina
>
>  --
>  http://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list