[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Stephen J. Turnbull stephen at xemacs.org
Wed Sep 10 18:59:52 CEST 2014


Barry Warsaw writes:
 > On Sep 10, 2014, at 08:42 AM, Cory Benfield wrote:
 > 
 > >I doubt you'll get far with this proposal on this list, which is a
 > >shame because I think you have a point. There is an impedance mismatch
 > >between the Python community saying "Bytes are not text" and the fact
 > >that, wow, they really do look like they are sometimes!

So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit
pointer, conveniently invalid on most 32-bit architectures and very
obvious when it shows up in a backtrace.  Do you see an impedence
mismatch in the C community because of that?

In fact, *all* bytes "look like text", because *you can't see them
until they're converted to text by repr()*!  This is the key to the
putative "impedence mismatch" -- it's perceived as such when people
don't distinguish the map from the territory.

The issue that sometimes it's easier to read hex than ASCII mixed with
other stuff (hex escapes or Latin-1) is true enough, though.  But it's
not about an impedence mismatch, it's a question of what does *this*
developer consider to be the convenient repr for *that* task.  I just
don't see hex-based use cases coming close to being as important as
the convenience for those cases where the structure being imposed on
some bytes is partly derived from English.  The current default repr
is, I believe, the right default repr.

That doesn't mean that it would be a terrible idea to provide other
reprs in the stdlib (although it is after all a one-liner!)

 > That's the nature of wire protocols - they're like quantum particles,
 > exhibiting both bytes-like and string-like behavior.

I find the analogy picturesque but unconvincing.  Wire protocols are
punctuated *by design* with European (mostly English) words, acronyms,
and abbreviations, because (a) it's convenient for syntax to be
mnemonic, (b) because the arbitrary standard for network streams is
octets, and you can't fit much more than an English character into an
octet, and (c) historically, English-speakers got there first (and had
economic hegemony on their side, too).

 > You can't look too closely, and they have spooky action at a
 > distance too.  For the email protocols at least, you also have
 > mind-crushing singularities.

Doom, gloom, DMARC, and boom!  But I guess you were referring to
From-stuffing, not From-munging.<wink/>



More information about the Python-ideas mailing list