unicode bit me

Nick Craig-Wood nick at craig-wood.com
Sun May 10 03:30:05 EDT 2009


anuraguniyal at yahoo.com <anuraguniyal at yahoo.com> wrote:
>  First of all thanks everybody for putting time with my confusing post
>  and I apologize for not being clear after so many efforts.
> 
>  here is my last try (you are free to ignore my request for free
>  advice)
> 
>  # -*- coding: utf-8 -*-
> 
>  class A(object):
> 
>      def __unicode__(self):
>          return u"©au"
> 
>      def __repr__(self):
>          return unicode(self).encode("utf-8")
> 
>      __str__ = __repr__
> 
>  a = A()
>  u1 = unicode(a)
>  u2 = unicode([a])
> 
>  now I am not using print so that doesn't matter stdout can print
>  unicode or not
>  my naive question is line u2 = unicode([a]) throws
>  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
>  1: ordinal not in range(128)
> 
>  shouldn't list class call unicode on its elements? 

You mean when you call unicode(a_list) it should unicode() on each of
the elements to build the resultq?

Yes that does seem sensible, however list doesn't have a __unicode__
method at all so I guess it is falling back to using __str__ on each
element, and which explains your problem exactly.

If you try your example on python 3 then you don't need the
__unicode__ method at all (all strings are unicode) and you won't have
the problem I predict. (I haven't got a python 3 in front of me at the
moment to test.)

So I doubt you'll find the momentum to fix this since unicode and str
integration was the main focus of python 3, but you could report a
bug.  If you attach a patch to fix it - so much the better!

Here is my demonstration of the problem with python 2.5.2

>> class A(object):
...     def __unicode__(self):
...         return u"\N{COPYRIGHT SIGN}au"
...     def __repr__(self):
...         return unicode(self).encode("utf-8")
...     __str__ = __repr__
...
>>> a = A()
>>> str(a)
'\xc2\xa9au'
>>> repr(a)
'\xc2\xa9au'
>>> unicode(a)
u'\xa9au'
>>> L=[a]
>>> str(L)
'[\xc2\xa9au]'
>>> repr(L)
'[\xc2\xa9au]'
>>> unicode(L)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)
>>> unicode('[\xc2\xa9au]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)
>>> L.__unicode__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute '__unicode__'
>>> unicode(str(L),"utf-8")
u'[\xa9au]'

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list