[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Wed Sep 10 20:35:25 CEST 2014

I originally wrote this late last night but realized today that I only
sent this reply to Terry Reedy, not to python-ideas. (Apologies, Terry
– I didn't mean to single you out with my rant!)

I'm reposting it in full, below. Some of these ideas have already been
raised by others and counter-arguments already posed. I still feel I
have not seen some of these points directly addressed, namely, the
unreasonableness of seeing bytes from floating point numbers as ASCII
characters, and the sanity of the API I counter-propose.

Message now appears below:

On Wed, Sep 10, 2014 at 1:11 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>
> I agree with Chris Lasher's basic point, that the representation of bytes confusingly contradicts the idea that bytes are bytes.  But it is not going to change.

Unless printable representation of bytes objects appears as part of
the language specification for Python 3, it's an implementation
detail, thus, it is a candidate for change, especially if the BDFL
wills it so. Consider me optimistic that we can change it, or I would
have just posted yet another "Python 3 gets it all wrong" blog post to
the web instead of writing this pre-proposal. :-)

>
>
>
> On 9/10/2014 3:56 AM, Cory Benfield wrote:
>>
>> On 10 September 2014 08:45, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>
>>> memoryview.cast can be a potentially useful tool for that :)
>>
>>
>> Sure, and so can binascii.hexlify (which is what I normally use).
>
>
> See http://bugs.python.org/issue9951 to add bytes.hex or .tohex as more of less the inverse of bytes.fromhex or even have hex(bytes) work.  This change *is* possible and I think we should pick one of the suggestions for 3.5.

Here's the API Issue 9951 is proposing:

    >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    b'Hello, World!'
    >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.tohex()
    b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    >>> b'Hello, World!'
    b'Hello, World!'
    >>> b'Hello, World!'.tohex()
    b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'

I'll tell you what: here's the API of my counter-proposal:

    >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.asciify()
    b'Hello, World!'
    >>> b'Hello, World!'
    b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    >>> b'Hello, World!'.asciify()
    b'Hello, World!'

Here's the prose description of my counter-proposal: add a method to
the bytes object called `.asciify`, that returns a printable
representation of the bytes, where bytes mapping to printable ASCII
characters are displayed as ASCII characters, and the remainder are
given as hex codes. That is, .asciify() should round-trip a bytes
literal. This frees up repr() to do what universally makes sense on a
series of bytes: state the bytes!

Marc-Andre Lemburg said:
>
> A definite -1 from me on making repr(b"Hello World") harder to read than necessary.

Okay, but a definite -1e6 from me on making my Python interpreter do this:

    >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12,
1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955)
    >>> my_packed_bytes
    b'Why, Guido? Why?'

I do understand the utility of peering in to ASCII text, but like Cory
Benfield stated earlier:

> I'm saying that I don't get to do debugging with a simple
> print statement when using the bytes type to do actual binary work,
> while those who are doing sort-of binary work do.

Does the inconvenience of having to explicitly call the .asciify()
method on a bytes object justify the current behavior for repr() on a
bytes object? The privilege of being lazy is obstructing the right to
see what we've actually got in the bytes object, and is jeopardizing
the very argument that "bytes are not strings".

On Wed, Sep 10, 2014 at 10:51 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
> On 10 September 2014 17:59, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit
>> pointer, conveniently invalid on most 32-bit architectures and very
>> obvious when it shows up in a backtrace.  Do you see an impedence
>> mismatch in the C community because of that?
>>
>> In fact, *all* bytes "look like text", because *you can't see them
>> until they're converted to text by repr()*!  This is the key to the
>> putative "impedence mismatch" -- it's perceived as such when people
>> don't distinguish the map from the territory.
>
> I apologise, I was insufficiently clear. I mean that interaction with
> the bytes type in Python has a lot of textual aspects to it. This is a
> *deliberate* decision (or at least the documentation makes it seem
> deliberate), and I can understand the rationale, but it's hard to be
> surprised that it leads developers astray.
>
> Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, it's
> a 32-bit something. Its type is undefined in that expression. It has a
> standard usage as a guard word, but still, let's not jump to
> conclusions here!
>
> I accept your core point, however, which I consider to be this:
>
>> The issue that sometimes it's easier to read hex than ASCII mixed with
>> other stuff (hex escapes or Latin-1) is true enough, though.  But it's
>> not about an impedence mismatch, it's a question of what does *this*
>> developer consider to be the convenient repr for *that* task.
>
> This is definitely true, which I believe I've already admitted in this
> thread. I do happen to believe that having it be hex would provide a
> better pedagogical position ("you know this isn't text because it
> looks like gibberish!"), but that ship sailed a long time ago.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/