[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Nick Coghlan ncoghlan at gmail.com
Thu Jan 9 11:32:26 CET 2014


On 9 Jan 2014 11:29, "INADA Naoki" <songofacandy at gmail.com> wrote:
>
>
>> And I think everyone was well intentioned - and python3 covers most of
the
>> bases, but working with binary data is not only a "wire-protocol
programmer's"
>> problem.

If you're working with binary data, use the binary API offered by bytes,
bytearray and memoryview.

> Needing a library to wrap bytesthing.format('ascii', 'surrogateescape')
>> or some such thing makes python3 less approachable for those who haven't
>> learned that yet - which was almost all of us at some point when we
started
>> programming.
>
> Totally agree with you.

If you're on a relatively modern OS, everything should be UTF-8 and you
should be fine as a beginner.

When you start encountered malformed data, Python 3 should throw an error,
and provide an opportunity to learn more (by looking up the error message),
where Python 2 would silently corrupt the data stream.

Python 2 enshrined a data model eminently suitable for boundary code that
dealt with ASCII compatible binary protocols (like web frameworks) as the
default text model. Application code then needed to take special steps to
get correct behaviour for the full Unicode range. In essence, the Python 2
text model is the POSIX text model with Unicode support bolted on to the
side to make it at least *possible* to write correct application code.

This is completely backwards. Web applications vastly outnumber web
frameworks, and the same goes for every other domain: applications are
vastly more common than the libraries and frameworks that handle data
transformations at system boundaries on their behalf, so making the latter
easier to write at the expense of the former is a deeply flawed design
choice.

So Python 3 reverses the situation: the core text model is now more
appropriate for the central application code, *after* the boundary code has
cleaned up the murky details of wire protocols and file formats.

This is pretty easy to deal with for *new* Python 3 code, since you just
write things to deal with either bytes or text as appropriate.

However, there is some code written for Python 2 that relies more heavily
on the ability to treat ascii compatible binary data as both binary data
*and* as text. This is the use case that Python 3 treats as a more
specialised use case (perhaps benefitting from a specialised third party
type), whereas Python 2 supports it by default.

This is also the use case that relied most heavily on implicit encoding and
decoding, since that's the mechanism that allows the 8-bit and Unicode
paths to share string literals.

Cheers,
Nick.

>
>
> --
> INADA Naoki  <songofacandy at gmail.com>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140109/574ae2c5/attachment-0001.html>


More information about the Python-Dev mailing list