Thanks for all responses

Wolfgang Meiners WolfgangMeiners01 at web.de
Wed Jun 1 13:29:44 EDT 2011


Am 31.05.11 23:56, schrieb Chris Angelico:
> On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
> <WolfgangMeiners01 at web.de> wrote:
>> Whenever i 'cross the border' of my program, i have to encode the 'list
>> of bytes' to an unicode string or decode the unicode string to a 'list
>> of bytes' which is meaningful to the world outside.
> 
> Most people use "encode" and "decode" the other way around; you encode
> a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
> you're correct.

Ok. I think i will adapt to the majority in this point.
I think i mixed up
unicodestring=unicode(bytestring,encoding='utf8')
and
bytestring=u'unicodestring'.encode('utf8')

> 
>> So encode early, decode lately means, to do it as near to the border as
>> possible and to encode/decode i need a coding system, for example 'utf8'
> 

I think i should change this to decode early, encode lately.

> Correct on both counts.
> 
>> That means, there should be an encoding/decoding possibility to every
>> interface i can use: files, stdin, stdout, stderr, gui (should be the
>> most important ones).
> 
> The file objects (as returned by open()) have an encoding, which
> (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
> might well accept Unicode strings directly - check the docs.
> 
>>    def __repr__(self):
>>        return u'My name is %s' % self.Name
> 
> This means that repr() will return a Unicode string.
> 
>>    # this does work
>>    print a.__repr__()
>>
>>    # throws an error if default encoding is ascii
>>    # but works if default encoding is utf8
>>    print a
>>
>>    # throws an error because a is not a string
>>    print unicode(a, encoding='utf8')
> 
> The __repr__ function is supposed to return a string object, in Python
> 2. See http://docs.python.org/reference/datamodel.html#object.__repr__
> for that and other advice on writing __repr__. The problems you're
> seeing are a result of the built-in repr() function calling
> a.__repr__() and then treating the return value as an ASCII str, not a
> Unicode string.
> 
> This would work:
>     def __repr__(self):
>         return (u'My name is %s' % self.Name).encode('utf8')
> 
> Alternatively, migrate to Python 3, where the default is Unicode
> strings. I tested this in Python 3.2 on Windows, but it should work on
> anything in the 3.x branch:
> 
> class NoEnc:
> 	def __init__(self,Name=None):
> 		self.Name=Name
> 	def __repr__(self):
> 		return 'My name is %s' % self.Name
> 
> if __name__ == '__main__':
> 
>    a = NoEnc('Müller')
> 
>    # this will still work (print is now a function, not a statement)
>    print(a.__repr__())
> 
>    # this will work in Python 3.x
>    print(a)
> 
>    # 'unicode' has been renamed to 'str', but it's already unicode so
> this makes no sense
>    print(str(a, encoding='utf8'))
> 
>    # to convert it to UTF-8, convert it to a string with str() or
> repr() and then print:
>    print(str(a).encode('utf8'))
> ############################
> 
> Note that the last one will probably not do what you expect. The
> Python 3 'print' function (it's not a statement any more, so you need
> parentheses around its argument) wants a Unicode string, so you don't
> need to encode it. When you encode a Unicode string as in the last
> example, it returns a bytes string (an array of bytes), which looks
> like this: b'My name is M\xc3\xbcller'  The print function wants
> Unicode, though, so it takes this unexpected object and calls str() on
> it, hence the odd display.
> 
> Hope that helps!

Yes it helped a lot. One last question here: When i have free choice and
i dont know Python 2 and Python 3 very good: What would be the
recommended choice?

> 
> Chris Angelico

Wolfgang




More information about the Python-list mailing list