[Python-Dev] PEP 460 reboot

Thu Jan 16 01:35:42 CET 2014

On 15 Jan 2014 20:58, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> Aside: OK, Guido, ya got me.
>
> I have a separate screed recounting the reasons for my apostasy, but
> that's probably not interesting any more.  I'll send it to individuals
> on request.
>
>  > But in terms of explaining the text model, that
>  > separation is important enough that
>  >
>  >     (1)  We should be reluctant to strengthen the
>  >          "its really just ASCII" messages.
>
> True.  I think the right message is is "Unless you know why you
> *desperately* want this, not only don't you need it, but using it is
> the Python equivalent of skydiving without a parachute."
>
> N.B. Don't take the metaphor as an insult.  I think it's become clear
> that those who "desperately want this" not only use parachutes, they
> pack their own.  No need to worry about them.
>
>  >     (2)  It *may* be worth creating a virtual
>  >          split in the documentation.
>
> Please don't.  All we need to tell naive users is:
>
>     Look at the structure of the bytes.  If that structure is "text",
>     convert to str using .decode().  Please don't use bytes.
>
>     If that structure isn't text, you're in a specialist domain, and
>     it's your problem.  Many structured uses of bytes use ASCII-
>     encoded keywords: we provide bytes methods for handling them, but
>     you *must* be aware that these methods *cannot* distinguish "bytes
>     representing text encoded as ASCII" from "any old bytes".  Be
>     warned: They will happily -- and silently -- corrupt the latter.
>     Make sure you respect the higher-level structure of your data when
>     using them.

Yes, I'm currently thinking the appropriate approach to the docs will be to
remove the current "these have most of the str methods too" paragraph for
binary sequences and instead create three completely explicit lists of
methods:

- provided, works with arbitrary data
- provided, assumes the use of an ASCII compatible data format
- not provided

PEP 461 would add a fourth category, of being provided, but with more
restricted semantics.

Cheers,
Nick.

>
>  >     Virtual subclass ASCIIStructuredBytes
>  >     ====================================
>  >
>  >     One particularly common use of bytes is to represent
>  >     the contents of a file, or of a network message.  In
>  >     these cases, the bytes will often represent Text
>  >     *in a specific encoding* and that encoding will usually
>  >     be a superset of ASCII.  Rather than create and support
>  >     an ASCIIStructuredBytes subclass, Python simply added
>  >     support for these use cases straight to Bytes objects,
>  >     and assumes that this support simply won't be used when
>  >     when it does not make sense. For example, bytes literals
>
> This is going quite the wrong direction, I think.  The only people who
> should care about "Text *in a specific encoding* and that encoding
> will usually be a superset of ASCII" are codec writers, and by now
> writing those is a very rare task.  Everybody else uses ASCII keywords
> in a simple formal language.
>
>  >     *could* be used to construct a sound sample, but the
>  >     literals will be far easier to read when they are used
>  >     to represent (encoded) ASCII text, such as "OPEN".
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140116/8f290d60/attachment.html>