[issue21331] Reversing an encoding with unicode-escape returns a different result

Thu Apr 24 00:17:11 CEST 2014

R. David Murray added the comment:

To understand why, understand that a byte string has no encoding inherent.  So when you call b'utf8string'.decode('unicode_escape'), python has no way to know how to interpret the non-ascii characters in that bytestring.  If you want the unicode_escape representation of something, you want to do 'string'.encode('unicode_escape').  If you then want that as a python string, you can do:

    'mystring'.encode('unicode_escape').decode('ascii')

In theory there ought to be a way to use the codecs module to go directly from unicode string to unicode-escaped string, but I don't know how to do it, since the proposal for the 'transform' method was rejected :)

Just to bend your brain a bit further, note that this does work:

>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 'unicode-escape')
'ä'

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21331>
_______________________________________