Odd unicode() behavior

Wed Aug 30 07:02:09 EDT 2006

maport at googlemail.com wrote:

> The behavior of the unicode built-in function when given a unicode
> string seems a little odd to me:
>
>>>> unicode(u"abc")
> u'abc'
>
>>>> unicode(u"abc", "ascii")
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: decoding Unicode is not supported
>
> I don't see why providing the encoding should make the function behave
> differently when given a Unicode string. Surely unicode(s) ought to
> bahave exactly the same as unicode(s,sys.getdefaultencoding())?

nope.

if you omit the encoding argument, unicode() behaves pretty much like str(),
using either the __unicode__ method or __str__/__repr__ + decoding to get
a Unicode string.

see the language reference for details, e.g:

    http://pyref.infogami.com/unicode

</F>