Python3.3 str() bug?

Oscar Benjamin oscar.j.benjamin at gmail.com
Sat Nov 10 11:45:26 EST 2012


On 9 November 2012 11:08, Helmut Jarausch <jarausch at igpm.rwth-aachen.de> wrote:
> On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote:
>
>> Helmut Jarausch, 09.11.2012 10:18:
>>> probably I'm missing something.
>>>
>>> Using   str(Arg) works just fine if  Arg is a list.
>>> But
>>>   str([],encoding='latin-1')
>>>
>>> gives the error
>>> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
>>>            list found
>>>
>>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
>>> Do I need to flatten any data structure which is normally excepted by str() ?
>>
>> Funny idea to call this a bug in Python. What your code is asking for is to
>> decode the object you pass in using the "latin-1" encoding. Since a list is
>> not something that is "encoded", let alone in latin-1, you get an error,
>> and actually a rather clear one.
>>
>> Note that this is not specific to Python3.3 or even 3.x. It's the same
>> thing in Py2 when you call the equivalent unicode() function.
>>
>
> For me it's not funny, at all.

I think the problem is that the str constructor does two fundamentally
different things depending on whether you have supplied the encoding
argument. From help(str) in Python 3.2:

 |  str(object[, encoding[, errors]]) -> str
 |
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.

So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes
a byte string (or a similar object) using a given encoding. In most
cases obj will be a byte string and it will be equivalent to using
obj.decode('xxx').

I think the help text is a little confusing. It says that encoding
defaults to sys.getdefaultencoding() but doesn't clarify but this only
applies if errors is given as a keyword argument since otherwise no
decoding is performed. Perhaps the help text would be clearer if it
listed the two operations as two separate cases e.g.:

str(object)
  Returns a string object from object.__str__() if it is defined or
otherwise object.__repr__(). Raises TypeError if the returned result
is not a string object.

str(bytes, [encoding[, errors]])
  If either encoding or errors is supplied, creates a new string
object by decoding bytes with the specified encoding. The bytes
argument can be any object that supports the buffer interface.
encoding defaults to sys.getdefaultencoding() and errors defaults to
'strict'.

> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
> a string.

Well actually Python 3.3 will happily convert it to a string using
bytes.__repr__ if you don't supply the encoding argument:

>>> str(b'this is a byte string')
"b'this is a byte string'"

> If I feed a list of bytestrings or a list of list of bytestrings to
> 'str' , etc, it should use the encoding for each bytestring component of the
> given data structure.

You can always do:

[str(obj, encoding='xxx') for obj in list_of_byte_strings]

> How can I convert a data strucure of arbitrarily complex nature, which contains
> bytestrings somewhere, to a string?

Using str(obj) or repr(obj). Of course this relies on the author of
type(obj) defining the appropriate methods and writing the code that
actually converts the object into a string.

> This problem has arisen while converting a working Python2 script to Python3.3.
> Since Python2 doesn't have bytestrings it just works.

In Python 2 ordinary strings are byte strings.

> Tell me how to convert  str(obj) from Python2 to Python3 if obj is an
> arbitrarily complex data structure containing bytestrings somewhere
> which have to be converted to strings with a given encoding?

The str function when used to convert a non-string object into a
string knows nothing about the object you provide except whether it
has __str__ or __repr__ methods. The only processing that is done is
to check that the returned result was actually a string:

>>> class A:
...   def __str__(self):
...     return []
...
>>> a = A()
>>> str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type list)

Perhaps it would help if you would explain why you want the string
object. I would only use str(complex_object) as something to print for
debugging so I would actually want it to show me which strings were
byte strings by marking them with a 'b' prefix and I would also want
it to show non-ascii characters with a \x hex code as it already does:

>>> a = [1, 2, b'caf\xe9']
>>> str(a)
"[1, 2, b'caf\\xe9']"

If I wanted to convert the object to a string in order to e.g. save it
to a file or database then I would write a function to create the
string that I wanted. I would only use str() to convert elementary
types like int and float into strings.


Oscar



More information about the Python-list mailing list