Unicode Objects in Tuples

Ned Batchelder ned at nedbatchelder.com
Fri Oct 11 05:22:36 EDT 2013


On 10/11/13 4:16 AM, Stephen Tucker wrote:
> I am using IDLE, Python 2.7.2 on Windows 7, 64-bit.
>
> I have four questions:
>
> 1. Why is it that
>      print unicode_object
> displays non-ASCII characters in the unicode object correctly, whereas
>      print (unicode_object, another_unicode_object)
> displays non-ASCII characters in the unicode objects as escape 
> sequences (as repr() does)?
>
> 2. Given that this is actually /deliberately /the case (which I, at 
> the moment, am finding difficult to accept), what is the neatest (that 
> is, the most Pythonic) way to get non-ASCII characters in unicode 
> objects in tuples displayed correctly?
>
> 3. A similar thing happens when I write such objects and tuples to a 
> file opened by
>      codecs.open ( ..., "utf-8")
> I have also found that, even though I use  write  to send the text to 
> the file, unicode objects not in tuples get their non-ASCII characters 
> sent to the file correctly, whereas, unicode objects in tuples get 
> their characters sent to the file as escape sequences. Why is this the 
> case?
>
> 4. As for question 1 above, I ask here also: What is the neatest way 
> to get round this?
>
> Stephen Tucker.
>

Although Python 3 is better than Python 2 at Unicode, as the others have 
said, the most important point is one that you hit upon yourself.

When you print an object x, you are actually printing str(x).  The str() 
of a tuple is a paren, followed by the repr()'s of its elements, 
separated by commas, then a closing paren.  Tuples and lists use the 
repr() of their elements when producing either their own str() or their 
own repr().

Python 3 does better at this because repr() in Python 3 will gladly 
include non-ASCII characters in its output, while Python 2 will only 
include ASCII characters, and so must resort to escape sequences. (BTW: 
if you like the ASCII-only idea from Python 2, Python 3 has the ascii() 
function and the %a string formatting directive for that very purpose.)

The two string representation alternatives str() and repr() can be 
confusing.  Think of it as: str() is for customers, repr() is for 
developers, or: str() is for humans, repr() is for geeks.   The reason 
tuples use the repr() of their elements is that the parens+commas 
representation of a tuple is geeky to begin with, so it uses repr() of 
its elements, even for str(tuple).

The way to avoid repr() for the elements is to format the tuple yourself.

--Ned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20131011/9a80fa0f/attachment.html>


More information about the Python-list mailing list