unicode wrap unicode object?

Sat Apr 8 02:26:38 EDT 2006

"ygao" <ygao2004 at gmail.com> wrote:

> >>> import sys
> >>> sys.setdefaultencoding("utf-8")

hmm.  what kind of bootleg python is that ?

>>> import sys
>>> sys.setdefaultencoding("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'

(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).

> >>> s='\xe9\xab\x98' #this uff-8 string
> >>> ss=U'\xe9\xab\x98'
> >>> s
> '\xe9\xab\x98'
> >>> ss
> u'\xe9\xab\x98'
> >>>
> how do I get ss from s?
> Can there be a way do this?

you have UTF-8 *bytes* in a Unicode text string?  sounds like
someone's made a mistake earlier on...

anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:

    >>> s = ss.encode("iso-8859-1")
    >>> s
    '\xe9\xab\x98'
    >>> s.decode("utf-8")
    u'\u9ad8'
    >>> import unicodedata
    >>> unicodedata.name(s.decode("utf-8"))
    'CJK UNIFIED IDEOGRAPH-9AD8'

but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)

</F>