[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Nick Coghlan ncoghlan at gmail.com
Fri May 27 08:45:13 CEST 2011


On Fri, May 27, 2011 at 4:14 PM, INADA Naoki <songofacandy at gmail.com> wrote:
> I love unicode and use unicode when I can use it.
> But this is a problem in the real world.
> For example, Python 2 is convenient for analyzing line based logs
> containing some different encodings. Python 3

...deliberately makes that difficult because it is *wrong*.

Binary files containing a mixture of encodings cannot be safely
treated as text. The closest it is possible to get is to support only
ASCII compatible encodings by decoding it as ASCII with the
"surrogateescape" error handler so that bytes with the high order bit
set can be faithfully reproduced on reencoding. However, such code
will potentially fail once it encounters a non-ASCII compatible
encoding, such as UTF-16 or -32.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list