unicode bit me

Mark Tolonen metolone+gmane at gmail.com
Sat May 9 17:21:43 EDT 2009


"Piet van Oostrum" <piet at cs.uu.nl> wrote in message 
news:m263gagjjl.fsf at cs.uu.nl...
>>>>> "Mark Tolonen" <metolone+gmane at gmail.com> (MT) wrote:

>>MT> <anuraguniyal at yahoo.com> wrote in message
>>MT> 
>>news:994147fb-cdf3-4c55-8dc5-62d769b12cdc at u9g2000pre.googlegroups.com...
>>>> Sorry being unclear again, hmm I am becoming an expert in it.
>>>>
>>>> I pasted that code as continuation of my old code at start
>>>> i.e
>>>> class A(object):
>>>> def __unicode__(self):
>>>> return u"©au"
>>>>
>>>> def __repr__(self):
>>>> return unicode(self).encode("utf-8")
>>>> __str__ = __repr__
>>>>
>>>> doesn't work means throws unicode error
>>>> my question boils down to
>>>> what is diff between, why one doesn't throws error and another does
>>>> print unicode(a)
>>>> vs
>>>> print unicode([a])

>>MT> That is still an incomplete example.  Your results depend on your 
>>source
>>MT> code's encoding and your system's stdout encoding.  Assuming a=A(),
>>MT> unicode(a) returns u'©au', but then is converted to stdout's encoding 
>>for
>>MT> display.

>You are confusing the issue. It does not depend on the source code's
>encoding (supposing that the encoding declaration in the source is
>correct). repr returns unicode(self).encode("utf-8"), so it is utf-8
>encoded even when the source code had a different encoding. The u"©au"
>string is not dependent on the source encoding.

Sorry about that.  I'd forgotten that the OP'd forced __repr__ to utf-8. 
You bring up a good point, though, that the encoding the file is actually 
saved in and the encoding declaration in the source have to match.  Many 
people get that wrong as well.

-Mark





More information about the Python-list mailing list