unicode bit me

Peter Otten __peter__ at web.de
Sun May 10 02:32:34 EDT 2009


anuraguniyal at yahoo.com wrote:

> First of all thanks everybody for putting time with my confusing post
> and I apologize for not being clear after so many efforts.
> 
> here is my last try (you are free to ignore my request for free
> advice)

Finally! This is the first of your posts that makes sense to me ;)

> # -*- coding: utf-8 -*-
> 
> class A(object):
> 
>     def __unicode__(self):
>         return u"©au"
> 
>     def __repr__(self):
>         return unicode(self).encode("utf-8")
> 
>     __str__ = __repr__
> 
> a = A()
> u1 = unicode(a)
> u2 = unicode([a])
> 
> now I am not using print so that doesn't matter stdout can print
> unicode or not
> my naive question is line u2 = unicode([a]) throws
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)

list doesn't have a __unicode__ method. unicode() therefore converts the 
list to str as a fallback and then uses sys.getdefaultencoding() to convert 
the result to unicode.

> shouldn't list class call unicode on its elements? 

No, it calls repr() on its elements. This is done to avoid confusing output:

>>> items = ["a, b", "[c]"]
>>> items
['a, b', '[c]']
>>> "[%s]" % ", ".join(map(str, items))
'[a, b, [c]]'

> I was expecting that so instead do i had to do this
> u3 = "["+u",".join(map(unicode,[a]))+"]"

Peter




More information about the Python-list mailing list