[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Thu Sep 11 08:47:01 CEST 2014

Let me start with this, from Nick:

> This is not an acceptable change, for two reasons:

> 1. It's a *major* compatibility break. It breaks single source Python 2/3
> development, it breaks doctests, it breaks user expectations.
>

Okay, breaking doctests, I can understand the negative impact. I'm willing
to give up because of this. So, on account of the fragility of doctests,

I suppose, yes, this proposal will never go through. And I feel that's a
shame, because I was never a fan of doctests, either.

Regarding user expectations, I've already stated, yes this continues with
the expectations of experienced users, who won't stumble when they see
ASCII in their bytes. For all other users, though, this behavior otherwise
violates the principle of least astonishment. ("Why are there English
characters in my bytes?")

2. It breaks the symmetry between the bytes literal format and their
> representation.

Symmetry is already broken for bytes literal format because the user is
allowed to enter hex codes, even if they map onto printable ASCII
characters:

    >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
    b'Hello, World!'

On Wed, Sep 10, 2014 at 7:35 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> After posting that version, I realised actually making the proposed
> change would be similarly straightforward, and better illustrate the
> core problem with the idea:
>
> $ ./python -c 'import os; print(os.listdir(b"foo"))'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> FileNotFoundError: [Errno 2] No such file or directory: b'\x66\x6f\x6f'
> $ ./python -c 'import os; print(os.listdir(b"Mac"))'
> [b'\x49\x44\x4c\x45', b'\x4d\x61\x6b\x65\x66\x69\x6c\x65\x2e\x69\x6e',
> b'\x54\x6f\x6f\x6c\x73',
> b'\x52\x45\x41\x44\x4d\x45\x2e\x6f\x72\x69\x67',
> b'\x50\x79\x74\x68\x6f\x6e\x4c\x61\x75\x6e\x63\x68\x65\x72',
> b'\x49\x63\x6f\x6e\x73', b'\x52\x45\x41\x44\x4d\x45',
> b'\x45\x78\x74\x72\x61\x73\x2e\x69\x6e\x73\x74\x61\x6c\x6c\x2e\x70\x79',
> b'\x42\x75\x69\x6c\x64\x53\x63\x72\x69\x70\x74',
> b'\x52\x65\x73\x6f\x75\x72\x63\x65\x73']
>

You passed bytes – not an ASCII string – as an argument to os.listdir; it
gave you back bytes, not ASCII strings. You _consented_ to bytes when you
put the b'Mac' in there; therefore, you are responsible for decoding those
bytes.

Yes, all text must be represented an bytes to a computer, but not all bytes
represent text.

> It's more than just a matter of backwards compatibility, it's a matter
> of asymmetry of impact when the two possible design choices are wrong:
>
> * Using a hex based repr when an ASCII based repr is more appropriate
> is utterly unreadable
> * Using an ASCII based repr when a hex based repr is more appropriate
> is somewhat confusing
>

I prefer to unframe it from ASCII. The decision is (well, was) between:

* A representation that is always accurate but sometimes inconvenient

versus

* A representation is convenient when it is accurate, but is not always
accurate (and  is inconvenient when it's inaccurate).

Earlier, Nick, you wrote

> > What you haven't said so far, however, and what I still don't know, is
> > whether or not the core team has already tried providing a method on
> bytes
> > objects à la the proposed .asciify() for projecting bytes as ASCII
> > characters, and rejected that on the basis of it being too inconvenient
> for
> > the vast majority of Python use cases.

That option was never really on the table, as once we decided back to
> switch to a hybrid ASCII representation, the obvious design model to
> use was the Python 2 str type, which has inherently hybrid behaviour,
> and uses the literal form for the "obj == eval(repr(obj))" round trip.

obj == eval(repr(obj)) round-trip behavior is not violated by the proposed
change

    >>> r = repr(b'Hello, World!')
    "b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'"
    >>> b'Hello, World!' == eval(r)
    True
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140910/942d46d5/attachment-0001.html>