[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Stephen J. Turnbull stephen at xemacs.org
Mon May 30 04:39:45 CEST 2011


Greg Ewing writes:

 > How would ascii behave when mixed with unicode strings? Should it
 > automatically coerce to unicode,

Definitely not!  Bytes are not text, and the programmer must say when
they want those bytes decoded.  The Python translator must not be
asked to guess.

 > or should an explicit decode() be required?

Simplest.

But IMHO worth considering is an implicit coercion of Unicode to ascii
via decode() with strict errors.  Remember, Unicode is an invertible
mapping of characters to abstract integers, which may be represented
in various different ways, such as bytes, 32-bit words, or UTF-8.  So
in some sense there is no violation of the Unicode type here.  Sorry,
I can't explain more clearly at the moment, but I have a strong sense
that coercion (ASCII) bytes -> Unicode *changes* or maybe even
"destroys" the type of the byte, while the coercion (ASCII) Unicode ->
bytes takes an abstract type "Unicode" and refines to a concrete type
"bytes".  Among other things, this is always reversible.

This takes into account the common usage of punning natural language
encoded in ASCII on binary protocol magic numbers.

Then one could write stuff like

    my_pipe.write('HELO ' + my_fqdn)

while true pedants would of course write

    my_pipe.write(b'HELO ' + my_fqdn)

This doesn't explain how to make it easy to ensure that my_fqdn is
bytes, of course, and that makes me uneasy about whether this would
actually be useful, or merely confusing.  (However, there are use
cases where it is claimed that 'HELO ' is needed both as str and as
bytes.)




More information about the Python-ideas mailing list