[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Nick Coghlan ncoghlan at gmail.com
Thu May 26 07:42:18 CEST 2011


On Thu, May 26, 2011 at 3:29 AM, INADA Naoki <songofacandy at gmail.com> wrote:
> There are some situation that I want to use bytes as a string in real world.

Breaking the bytes-are-text mental model is something we deliberately
set out to do with Python 3 (because it is wrong). In today's global
environment, programmers *need* to learn about text encoding issues as
treating bytes as text without finding out the encoding first is a
surefire way to get unintelligible mojibake. If "What does 'latin-1'
mean?" is a question that gets them there, then that's fine.

You *cannot* transparently handle data in arbitrary encodings, as the
meanings of the bytes change based on the encoding (this is especially
true when dealing with non-ASCII compatible encodings).

That said, decoding and reencoding via 'ascii' (strict 7-bit) or
'latin-1' (full 8-bit) is the easiest way to handle both strings and
bytes input reasonably efficiently. See urllib.parse for examples on
how to do that.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list