[Python-3000] Pre-PEP: Easy Text File Decoding
Marcin 'Qrczak' Kowalczyk
qrczak at knm.org.pl
Sat Oct 14 20:45:54 CEST 2006
"Martin v. Löwis" <martin at v.loewis.de> writes:
> Marcin 'Qrczak' Kowalczyk schrieb:
>> I've implemented a hack which allows simple programs to "just work" in
>> case of UTF-8. It's a modified encoder/decoder which escapes malformed
>> UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
>> sequences to round-trip UTF-8 decoding and encoding. It's not used by
>> default and it's never used when "UTF-8" is specified explicitly,
>> because it's not the true UTF-8, but I have an environment variable
>> which says "if the locale is UTF-8, use the modified UTF-8 as the
>> default encoding".
>
> Actually, I think there is a "better" (i.e. more unicode-like way):
> use the private-use area.
It changes the interpretation of some filenames which are valid UTF-8
(or generally of texts known to not contain '\0'). My hack is a pure
extension since U+0000 can't be produced by standard UTF-8.
> For Py3k, I would like to propose a standard "binary" codec,
> which is an ASCII superset and decodes bytes 00..7F to ASCII,
> and bytes 80..FF to U+EFxx. This would allow to round-trip
> bytes through text.
It's simpler to use the existing ISO-8859-1 encoding.
--
__("< Marcin Kowalczyk
\__/ qrczak at knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
More information about the Python-3000
mailing list