[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Wed Apr 29 04:39:20 CEST 2009

Martin v. Löwis wrote:
>> Since the serialization of the Unicode string is likely to use UTF-8,
>> and the string for  such a file will include half surrogates, the
>> application may raise an exception when encoding the names for a
>> configuration file. These encoding exceptions will be as rare as the
>> unusual names (which the careful I18N aware developer has probably
>> eradicated from his system), and thus will appear late.
> 
> There are trade-offs to any solution; if there was a solution without
> trade-offs, it would be implemented already.
> 
> The Python UTF-8 codec will happily encode half-surrogates; people argue
> that it is a bug that it does so, however, it would help in this
> specific case.

Can we use this encoding scheme for writing into files as well?  We've
turned the filename with undecodable bytes into a string with half
surrogates.  Putting that string into a file has to turn them into bytes
at some level.  Can we use the python-escape error handler to achieve
that somehow?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/18c8fa55/attachment.pgp>