Thanks for all responses

Chris Angelico rosuav at gmail.com
Tue May 31 17:56:24 EDT 2011


On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
<WolfgangMeiners01 at web.de> wrote:
> Whenever i 'cross the border' of my program, i have to encode the 'list
> of bytes' to an unicode string or decode the unicode string to a 'list
> of bytes' which is meaningful to the world outside.

Most people use "encode" and "decode" the other way around; you encode
a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
you're correct.

> So encode early, decode lately means, to do it as near to the border as
> possible and to encode/decode i need a coding system, for example 'utf8'

Correct on both counts.

> That means, there should be an encoding/decoding possibility to every
> interface i can use: files, stdin, stdout, stderr, gui (should be the
> most important ones).

The file objects (as returned by open()) have an encoding, which
(IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
might well accept Unicode strings directly - check the docs.

>    def __repr__(self):
>        return u'My name is %s' % self.Name

This means that repr() will return a Unicode string.

>    # this does work
>    print a.__repr__()
>
>    # throws an error if default encoding is ascii
>    # but works if default encoding is utf8
>    print a
>
>    # throws an error because a is not a string
>    print unicode(a, encoding='utf8')

The __repr__ function is supposed to return a string object, in Python
2. See http://docs.python.org/reference/datamodel.html#object.__repr__
for that and other advice on writing __repr__. The problems you're
seeing are a result of the built-in repr() function calling
a.__repr__() and then treating the return value as an ASCII str, not a
Unicode string.

This would work:
    def __repr__(self):
        return (u'My name is %s' % self.Name).encode('utf8')

Alternatively, migrate to Python 3, where the default is Unicode
strings. I tested this in Python 3.2 on Windows, but it should work on
anything in the 3.x branch:

class NoEnc:
	def __init__(self,Name=None):
		self.Name=Name
	def __repr__(self):
		return 'My name is %s' % self.Name

if __name__ == '__main__':

   a = NoEnc('Müller')

   # this will still work (print is now a function, not a statement)
   print(a.__repr__())

   # this will work in Python 3.x
   print(a)

   # 'unicode' has been renamed to 'str', but it's already unicode so
this makes no sense
   print(str(a, encoding='utf8'))

   # to convert it to UTF-8, convert it to a string with str() or
repr() and then print:
   print(str(a).encode('utf8'))
############################

Note that the last one will probably not do what you expect. The
Python 3 'print' function (it's not a statement any more, so you need
parentheses around its argument) wants a Unicode string, so you don't
need to encode it. When you encode a Unicode string as in the last
example, it returns a bytes string (an array of bytes), which looks
like this: b'My name is M\xc3\xbcller'  The print function wants
Unicode, though, so it takes this unexpected object and calls str() on
it, hence the odd display.

Hope that helps!

Chris Angelico



More information about the Python-list mailing list