[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Fri Sep 19 16:05:47 CEST 2014

Guido van Rossum writes:

 > What is this "it" that you propose to just do? I'm sure I have an
 > opinion on it once you describe it to me.

I'm sorry, I probably shouldn't have taken your name in vain at this
stage.  There are no solid proposals yet, the details of format
characters, how to use "precision", the symbol to indicate "chunking"
etc are all under discussion.

Brief summary and links, if you care to read further:

At present there are at least three kinds of proposals on the table
for a __format__ for bytes objects, with the proposals and dicussion
being collected in http://bugs.python.org/issue22385.  Eric V. Smith
gave the following summary (edited by me for brevity) in
https://mail.python.org/pipermail/python-ideas/2014-September/029353.html

    1. Support exactly what the standard types (int, str, float, etc.)
       support, but give slightly different semantics to it.

    2. Support a slightly different format specifier. The downside of
       this is that it might be confusing to some users, who see the
       printf-like formatting as some universal standard. It's also
       hard to document.

    3. Do something radically different. I gave an example on the
       issue tracker [cited above], but I'm not totally serious about
       this.

My "Just Do It" was mostly ignoring the possibility of Eric's #3 (Eric
was even more deprecatory in the issue, saying "although it's insane,
you could ...").  I was specifically referring to Eric's and Andrew
Barnhart's discussion of potential confusion, Eric saying "if it's
different, users used to printf may get confused" and Andrew saying
(among other ideas) "if it's too close to the notation for str, it
could exacerbate the existing confusion between bytes and str".

I don't see the too close/too different issue as something we can
decide without implementing it.  Perhaps experience with a PyPI module
would give guidance, but I'm not optimistic, the kind of user who
would use a PyPI module for this feature is atypical, I think.

                                *****

In somewhat more detail, Nick's original proposal (in that issue)
follows existing format strings very closely:

    "x": display a-f as lowercase digits
    "X": display A-F as uppercase digits
    "#": includes 0x prefix
    ".prec": chunks output, placing a space after every <prec> bytes
    ",": uses a comma as the separator, rather than a space

Further discussion and examples in
https://mail.python.org/pipermail/python-ideas/2014-September/029352.html.
There he made a second proposal, rather different:

    "h": lowercase hex
    "H": uppercase hex
    "A": ASCII (using "." for unprintable & extended ASCII)

    format(b"xyz", "A") -> 'xyz'
    format(b"xyz", "h") -> '78797a'
    format(b"xyz", "H") -> '78797A'

    Followed by a separator and "chunk size":

    format(b"xyz", "h 1") -> '78 79 7a'
    format(b"abcdwxyz", "h 4") -> '61626364 7778797a'

    format(b"xyz", "h,1") -> '78,79,7a'
    format(b"abcdwxyz", "h,4") -> '61626364,7778797a'

    format(b"xyz", "h:1") -> '78:79:7a'
    format(b"abcdwxyz", "h:4") -> '61626364:7778797a'

    In the "h" and "H" cases, you could request a preceding "0x" on the
    chunks:

    format(b"xyz", "h#") -> '0x78797a'
    format(b"xyz", "h# 1") -> '0x78 0x79 0x7a'
    format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a'

Nick was clear that all of the notation in the above is tentative in
his mind.  The third proposal is from Eric Smith, in
https://mail.python.org/pipermail/python-ideas/2014-September/029353.html
(already cited above):

    Here's my proposal for #2: The format specifier becomes:
    [[fill]align][#][width][separator]][/chunksize][type]