[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Toshio Kuratomi
a.badger at gmail.com
Wed Apr 29 04:39:20 CEST 2009
Martin v. Löwis wrote:
>> Since the serialization of the Unicode string is likely to use UTF-8,
>> and the string for such a file will include half surrogates, the
>> application may raise an exception when encoding the names for a
>> configuration file. These encoding exceptions will be as rare as the
>> unusual names (which the careful I18N aware developer has probably
>> eradicated from his system), and thus will appear late.
>
> There are trade-offs to any solution; if there was a solution without
> trade-offs, it would be implemented already.
>
> The Python UTF-8 codec will happily encode half-surrogates; people argue
> that it is a bug that it does so, however, it would help in this
> specific case.
Can we use this encoding scheme for writing into files as well? We've
turned the filename with undecodable bytes into a string with half
surrogates. Putting that string into a file has to turn them into bytes
at some level. Can we use the python-escape error handler to achieve
that somehow?
-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/18c8fa55/attachment.pgp>
More information about the Python-Dev
mailing list