[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Wed Sep 10 09:42:32 CEST 2014

On 10 September 2014 08:04, Chris Lasher <chris.lasher at gmail.com> wrote:
> Why did the CPython core developers decide to force the display of
> ASCII characters in the printable representation of bytes objects in
> CPython 3?

I'd argue this is symptomatic of something that got mentioned in the
lengthy discussions around PEP 461: namely, that Python's bytestrings
are really still very stringy. For example, they retain their 'upper'
method, which is so totally bizarre in the context of bytes that it
causes me to mentally segfault every time I see it:

>>> a = b'hi there'
>>> a.upper()
b'HI THERE'

As Nick mentioned, this is fundamentally because of protocols like
HTTP/1.1, which are a weird hybrid of text-based and binary that is
only simple if you assume ASCII everywhere. (Of course, HTTP does not
assume ASCII everywhere, but that's because it's wildly
underspecified).

I doubt you'll get far with this proposal on this list, which is a
shame because I think you have a point. There is an impedance mismatch
between the Python community saying "Bytes are not text" and the fact
that, wow, they really do look like they are sometimes!

For what it's worth, Nick has made this comment:

> Primarily because it's incredibly useful for debugging ASCII based
> binary formats (which covers many network protocols and file formats).

This is true, but it goes both ways: it makes it a lot *harder* to
debug pure-binary network formats (like HTTP/2). I basically have to
have an ASCII codepage in front of me to debug using the printed
representation of a bytestring because I keep getting characters
thrown into my nice hex output. Sadly, you can't please everyone.