Documentation for "str()" could use some adjustment - Unicode issue

John Nagle nagle at animats.com
Mon Mar 19 13:01:23 EDT 2007


Daniel Nogradi wrote:
>> The Python documentation for "str" says
>> "str([object]) :
>>      Return a string containing a nicely printable representation of an
>> object."
>>
>> However, there's no mention of the fact that "str" of a Unicode string
>> with non-ASCII characters will raise a conversion exception.  The
>> documentation (and several Python books) seem to indicate that "str" will
>> produce some "printable representation" for anything, not raise an
>> exception.
>>
>> I know, it was proposed in PEP 349 to change "str" to return Unicode
>> strings, and that's coming someday, along with all-Unicode Python,
>> but meanwhile, the documentation should be clear about this.
> 
> 
> Hi John, I'm not at all an expert around here but my understanding of
> the workflow is that bug reports should be submitted to the tracker at
> sourceforge. There is no guarantee that they will be noticed there but
> the chances there are much higher than for a message on the mailing
> list. I just mention this because I've seen your mysqld, urllib and
> other potentially important bug reports here with no action taken but
> maybe submitting them to sourceforge would attract some developers.

     More people read the mailing list than the bug reports, and
Google indexes it, so it puts the problem on record.   If this problem
were reported as a bug, there'd be denial that it was a problem, on
the grounds that it will be fixed in a future version of Python.

     This was addressed in PEP 349 (status "Deferred"), so it's
a known problem that's being ignored.

     Python strings are in transition; in ASCII-only Python, "str"
could never raise a conversion exception, and when we get to
Unicode-only strings, "str" will never raise a conversion exception.
But right now, "str" can definitely raise an exception.

    The real problem is the published books on Python:

"Learning Python", by Lutz and Ascher:
    "str(string) -- returns the string representation of any object."

"Python in a Nutshell", by Martelli
    Doesn't really address the issue, but says that "print" calls
"str" for conversions.

Neither of these mentions that "str" is ASCII-only.

     I think that you can use "unicode()" on any object on
which you can use "str()", but I'm not sure that's official.

     Incidentally, is "repr" ASCII-only, or does it understand
Unicode?  And what happens if the __str__() method of an object
returns Unicode?
	
				John Nagle



More information about the Python-list mailing list