[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
Nick Coghlan
ncoghlan at gmail.com
Thu May 26 07:42:18 CEST 2011
On Thu, May 26, 2011 at 3:29 AM, INADA Naoki <songofacandy at gmail.com> wrote:
> There are some situation that I want to use bytes as a string in real world.
Breaking the bytes-are-text mental model is something we deliberately
set out to do with Python 3 (because it is wrong). In today's global
environment, programmers *need* to learn about text encoding issues as
treating bytes as text without finding out the encoding first is a
surefire way to get unintelligible mojibake. If "What does 'latin-1'
mean?" is a question that gets them there, then that's fine.
You *cannot* transparently handle data in arbitrary encodings, as the
meanings of the bytes change based on the encoding (this is especially
true when dealing with non-ASCII compatible encodings).
That said, decoding and reencoding via 'ascii' (strict 7-bit) or
'latin-1' (full 8-bit) is the easiest way to handle both strings and
bytes input reasonably efficiently. See urllib.parse for examples on
how to do that.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list