unicode wrap unicode object?

Fredrik Lundh fredrik at pythonware.com
Sat Apr 8 04:42:16 EDT 2006


"ygao" wrpte_

> I must use utf-8 for chinese.

yeah, but you shouldn't store it in a *Unicode* string.  Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.

if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:

    b = "... some utf-8 data ..."

    # turn it into a unicode string
    u = unicode(b, "utf-8")

    # ... do something with it ...

    # turn it back into a utf-8 string
    s = u.encode("utf-8")

    # or use some other encoding
    s = u.encode("big5")

e.g.

    >>> b = '\xe9\xab\x98'
    >>> u = unicode(b, "utf-8")
    >>> u.encode("utf-8")
    '\xe9\xab\x98'
    >>> u.encode("big5")
    '\xb0\xaa'

</F>






More information about the Python-list mailing list