COnvert to unicode

Chris Angelico rosuav at gmail.com
Thu Apr 7 13:38:42 EDT 2016


On Fri, Apr 8, 2016 at 1:33 AM, Joaquin Alzola
<Joaquin.Alzola at lebara.com> wrote:
> hello  there
> this is a test
>
> (also \n important)
>
> To this Unicode:
> 00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a
> Without the \u and space.

What happens if you have a non-BMP codepoint? So far, what you have is
pretty straight-forward.

>>> s = "hello  there\nthis is a test\n"
>>> "".join("%04x" % ord(x) for x in s)
'00680065006c006c006f0020002000740068006500720065000a00740068006900730020006900730020006100200074006500730074000a'

But if you have codepoints that don't fit in four hex digits, this
will mess up your formatting. You'll need to decide how to handle
those.

ChrisA



More information about the Python-list mailing list