[issue21331] Reversing an encoding with unicode-escape returns a different result
R. David Murray
report at bugs.python.org
Thu Apr 24 00:17:11 CEST 2014
R. David Murray added the comment:
To understand why, understand that a byte string has no encoding inherent. So when you call b'utf8string'.decode('unicode_escape'), python has no way to know how to interpret the non-ascii characters in that bytestring. If you want the unicode_escape representation of something, you want to do 'string'.encode('unicode_escape'). If you then want that as a python string, you can do:
'mystring'.encode('unicode_escape').decode('ascii')
In theory there ought to be a way to use the codecs module to go directly from unicode string to unicode-escaped string, but I don't know how to do it, since the proposal for the 'transform' method was rejected :)
Just to bend your brain a bit further, note that this does work:
>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 'unicode-escape')
'ä'
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21331>
_______________________________________
More information about the Python-bugs-list
mailing list