Unicode Objects in Tuples

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Oct 11 13:06:05 EDT 2013


On Fri, 11 Oct 2013 09:16:36 +0100, Stephen Tucker wrote:

> I am using IDLE, Python 2.7.2 on Windows 7, 64-bit.
> 
> I have four questions:
> 
> 1. Why is it that
>      print unicode_object
> displays non-ASCII characters in the unicode object correctly, whereas
>      print (unicode_object, another_unicode_object)
> displays non-ASCII characters in the unicode objects as escape sequences
> (as repr() does)?

Because that is the design of Python. Printing compound objects like 
tuples, lists and dicts always uses the repr of the components. 
Otherwise, you couldn't tell the difference between (say) (23, 42) and 
("23", "42").

If you want something different, you have to do it yourself.

However, having said that, it is true that the repr() of Unicode strings 
in Python 2 is rather lame. Python 3 is much better:

[steve at ando ~]$ python2.7 -c "print repr(u'∫ßδЛ')"
u'\xe2\x88\xab\xc3\x9f\xce\xb4\xd0\x9b'

[steve at ando ~]$ python3.3 -c "print(repr('∫ßδЛ'))"
'∫ßδЛ'

So if you have the opportunity to upgrade to Python 3.3, I recommend it.


> 2. Given that this is actually *deliberately *the case (which I, at the
> moment, am finding difficult to accept), what is the neatest (that is,
> the most Pythonic) way to get non-ASCII characters in unicode objects in
> tuples displayed correctly?

I'd go with something like this helper function:

def print_unicode(obj):
    if isinstance(obj, (tuple, list, set, frozenset)):
        print u', '.join(unicode(item) for item in obj)
    else:
        print unicode(item)


Adjust to taste :-)


> 3. A similar thing happens when I write such objects and tuples to a
> file opened by
>      codecs.open ( ..., "utf-8")
> I have also found that, even though I use  write  to send the text to
> the file, unicode objects not in tuples get their non-ASCII characters
> sent to the file correctly, whereas, unicode objects in tuples get their
> characters sent to the file as escape sequences. Why is this the case?

Same reason. The default string converter for tuples uses the repr, which 
intentionally uses escape sequences. If you want something different, you 
can program it yourself.


-- 
Steven



More information about the Python-list mailing list