[Python-Dev] PEP 383 and GUI libraries

Terry Reedy tjreedy at udel.edu
Fri May 1 22:21:36 CEST 2009


Zooko O'Whielacronx wrote:
> Following-up to my own post to correct a major error:

> Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ?  That is my Requirement

If you start with bytes, decode with utf-8b to unicode (possibly 
'invalid'), and encode the result back to bytes with utf-8b, you should 
get the original bytes, regardless of what they were.  That is the point 
of PEP 383 -- to reliably roundtrip file 'names' that start as bytes and 
must end as the same bytes but which may not otherwise have a unicode 
decoding.

If you start with invalid unicode text, encode to bytes with utf-8b, and 
decode back to unicode, you might instead get a different and valid 
unicode text.  An example was given in the discussion.  I believe this 
would be hard to avoid.  An any case, it does not matter for the use 
case of starting with bytes that one wants to temporarily but surely 
work with as text.

Terry Jan Reedy



More information about the Python-Dev mailing list