str vs. repr

Wed Nov 3 20:57:12 EST 1999

[Tim Peters, arguing against the current design of str() and repr()]

> [Randall Hopper wonders about str-vs-repr, Tim explains that lists ask
>  their elements to produce repr() even if the list itself was passed
>  to str()
> ]
> 
> [Guido]
> > Actually, Python's internals *do* follow the convention.  Dicts and
> > lists don't define __str__(), so str() for them defaults to
> > __repr__(), and then of course repr() is used for the items.  It may
> > be confusing, it may not be what you want, but it *is* consistent ;-(
> 
> Not to mention undocumented <wink>.
> 
> [Randall]
> > For consistency, would it make sense to change this for Python 1.5.3 (that
> > is, have sequence and dict types pass 'str-vs-repr'ness down)?
> 
> [Guido]
> > This has been asked a few times before (and usually it's been reported
> > as a bug, which it isn't -- see above).  I happened to see this post
> > and it made me delve deep into my intuition trying to figure out why I
> > don't like propagating str() down container items.
> >
> > Here's what I think it is.  There's no reason why an object's str()
> > should be particularly suited to being included in list syntax.
> 
> This seems much more a consequence of the current design than an argument in
> favor of it.  That is, had Python been designed so that the builtin
> container types did "pass down" str-vs-repr'ness, an object's str() would
> have every reason to produce a string suited to being etc.
> 
> > For example, I could have a list containing the following items:
> >
> > 	1		# an integer
> > 	'1'		# a string (of an integer literal)
> > 	'2, 3'		# a string containing a comma and a space
> > 	'], ['		# a string containing list delimiters
> >
> > Under the proposed rules, this list would print as:
> >
> > 	[1, 1, 2, 3], []
> >
> > I would find this confusing
> 
> Me too, but I find the current design more rigidly consistent than useful
> (see below).  In a world where containers passed str() down, a container's
> str() would presumably be responsible for adding disambiguating delimeters
> to element str() results when needed (the container knows its own output
> syntax, and can examine the strings produced by its elements -- not rigidly
> consistent, but useful <wink>).

Hm...  What kind of things would you expect e.g. the list str() to do
to its item str()s?  Put backslashes before commas?

> > and I worry that it could be used to fool the user.
> 
> People can already define __repr__ to return anything whatsoever; the
> reports of people getting fooled by this are conspicuous by absence <wink>.
> 
> Here's something wholly typical of what I dislike:
> 
> >>> from Rational import Rat, Format
> >>> Rat.set_df(Format.Format(mode=Format.FIXED, prec=3, use_tag=0))
> Format(mode=Format.MIXED, prec=8, base=10, round=Round.NEAREST_EVEN,
>        use_tag=1, use_letters=1)
> >>> one_tenth = Rat.Rat(.1)
> >>> one_tenth
> Rat(3602879701896397L, 36028797018963968L)
> >>> print one_tenth
> 0.100
> >>>
> 
> That is, in interactive mode, I'm forever using "print" because the default
> of applying repr() to raw expressions produces the output least useful in
> interactive hacking (I don't care about reproducing the object exactly from
> the string when typing a raw expression at the prompt!  The mental ratio of
> two giant integers isn't helpful here.).
> 
> Carry it one more step, and nothing simple suffices anymore:
> 
> >>> values = [one_tenth, one_tenth + 100]
> >>> values
> [Rat(3602879701896397L, 36028797018963968L),
>  Rat(3606482581598293197L, 36028797018963968L)]
> >>> print values
> [Rat(3602879701896397L, 36028797018963968L),
>  Rat(3606482581598293197L, 36028797018963968L)]
> >>>
> 
> So I'm forever typing this instead:
> 
> >>> map(str, values)
> ['0.100', '100.100']
> >>>
> 
> Throw a dict into it, and it's hopeless:
> 
> >>> recip = {one_tenth: 1/one_tenth, 1/one_tenth: one_tenth}
> >>> print recip
> {Rat(3602879701896397L, 36028797018963968L):
>      Rat(36028797018963968L, 3602879701896397L),
>  Rat(36028797018963968L, 3602879701896397L):
>      Rat(3602879701896397L, 36028797018963968L)}
> >>>
> 
> Having gone thru the same business in dozens of classes over the years, I
> find the current design simply unusable in interactive mode.  For a while I
> defined just __str__, bound __repr__ to that too, and added a .repr()
> *method* for the unusual cases in which I really needed a faithful string.
> But that frustrated other code that expected explict repr() calls, and/or
> the `` notation, to produce the long-winded version.  So that sucked too.
> 
> It's even an irritation sticking to builtin types; e.g., here assuming
> Latin-1 comes across intact:
> 
> >>> names = ["François", "Tim"]
> >>> print names[0]
> François
> >>> >>> print names
> ['Fran\347ois', 'Tim']
> >>>
> 
> That isn't helpful either -- it's frustrating.

These are all good points.

In a typical scenario which I wanted to avoid, a user has a variable
containing the string '1' but mistakenly believes that it contains the
integer 1.  (This happens a lot, e.g. it could be read from a file
containing numbers.)  The user tries various numeric operations on the
variable and they all raise exceptions.  The user is inexperienced and
doesn't understand what the exceptions are, but gets the idea to
display its value to see if something's wrong with it.  One of the
first things users learn is to use interactive Python as a power
calculator, so my hypothetical user just types the name of the
variable.  If this would use str() to format the value, the user is no
wiser, and perhaps more confused, since str('1') is the same as
str(1).  So I designed Python's read-eval-print loop to use repr()
instead of str(): when the user tries to display the variable, it will
show the string quotes which are a pretty good hint that it's a
string.  (Alternative scenario: the user has this problem and shows it
to a somewhat more experienced user, who displays the variable and
notes the problem from the output.)

Where have I gone wrong?  It seems that you are suggesting that the
read-eval-print loop (a.k.a. the >>> prompt or the interactive
interpreter) should use str(), not repr().  This would solve your
first example; if str() for lists, tuples and dictionaries were to
apply str() to their items, your second and third example would also
be solved.

We can then argue over what str() of a list L should return; one
extreme possibility would be to return string.join(map(str, L)); a
slightly less radical solution would be '[' + string.join(map(str, L),
', ') + ']'.  In the first case, your last example would go like this:

>>> names
François Tim
>>> 

while the choice would give

>>> names
[François, Tim]
>>>

There may be other solutions -- e.g. in Tcl, a list is displayed as
the items separated by spaces, with the proviso that items containing
spaces are displayed inside (matching) curly braces; unmatched braces
are displayed using backslashes, guaranteeing that the output can be
parsed back into a list with the same value as the original.  (Hey!
That's the same as Python's rule!  So why does it work in Tcl?
Because variables only contain strings, and the equivalent of the Rat
class used above can't be coded in Tcl.)

The problem with displaying 'François' has been mentioned to me
before.  (By the way, I have no idea how to *type* that.  I'm just
cutting and pasting it from Tim's message.)

There's another scenario I was trying to avoid.  This is probably
something that happened once too many times when I was young and
innocent, so I may be overracting.  Consider the following:

>>> f = open("somefile")
>>> a = f.readline()
>>> print a
%âãÏÓ1.3

>>> 

Now this example is relatively harmless.  Using repr(), I see that the
string contains a \r character that caused the cursor to back up to
the start of the line, overwriting what was already written:

>>> a
'%PDF-1.3\015%\342\343\317\323\015\012'
>>>

But the thing that some big bad program did to me long ago was more
like spit out several thousand garbage bytes which contained enough
escape sequences to lock up my terminal requiring me to powercycle and
log in again.  (The fact that the story refers to a terminal indicates
how long ago this was. :-)

So I vowed that *my* language would not (easily) let this happen by
accident, and the way I enforced that was by making sure that all
non-ASCII characters would be printed as octal escapes, unless you use
the print statement.

It's a separate dilemma from the other examples.  My problem here is
that I hate to make assumptions about the character set in use.  How
do I know that '\237' is unprintable but '\241' is a printable
character?  How do I know that the latter is an upside-down
exclamation point?  Should I really assume stdout is capable of
displaying Latin-1?  Strictly, the str() function doesn't even know
that it's output is going to stdout.  I suppose I could use isprint(),
and then François could use the locale module in his $PYTHONSTARTUP
file to make it do the right thing.  Is that good enough?  (I just
tried this briefly.  It seems that somehow the locale module doesn't
affect this?!?!

I still think that ['a', 'b'] should be displayed like that, and not
like [a, b].  I'm not sure what that to do about a dict of Rat()
items, except perhaps giving up on the fiction that __repr__() should
return something expression-like for user-defined classes...

Any suggestions?

--Guido van Rossum (home page: http://www.python.org/~guido/)

__str__ vs. __repr__

str vs. repr