[Python-Dev] PEP 461 updates

Stephen J. Turnbull stephen at xemacs.org
Fri Jan 17 03:19:44 CET 2014


Meta enough that I'll take Guido out of the CC.

Nick Coghlan writes:

 > There are plenty of data formats (like SMTP and HTTP) that are
 > constrained to be ASCII compatible,

"ASCII compatible" is a technical term in encodings, which means
"bytes in the range 0-127 always have ASCII coded character semantics,
do what you like with bytes in the range 128-255."[1]

Worse, it's clearly confusing in this discussion.  Let's stop using
this term to mean

    the data format has elements that are defined to contain only
    bytes with ASCII coded character semantics

(which is the relevant restriction AFAICS -- I don't know of any
ASCII-compatible formats where the bytes 128-255 are used for any
purpose other than encoding non-ASCII characters).  OTOH, if it *is*
an ASCII-compatible text encoding, the semantics are dubious if the
bytes versions of many of these methods/operations are used.

A documentation suggestion: It's easy enough to rewrite

 > constrained to be ASCII compatible, either globally, or locally in
 > the parts being manipulated by an application (such as a file
 > header). ASCII incompatible segments may be present, but in ways
 > that allow the data processing to handle them correctly.

as 

    containing 'well-defined segments constrained to be (strictly)
    ASCII-encoded' (aka ASCII segments).

And then you can say 

    <specified bytes methods> are designed for use *only* on bytes
    that are ASCII segments; use on other data is likely to cause
    hard-to-diagnose corruption.

If there are other use cases for "ASCII-compatible data formats" as
defined above (not worrying about codecs, because they are a very
small minority of code-to-be-written at this point), I don't know
about them.  Does anyone?  If there are any, I'll be happy to revise.
If not, that seems to be a precise and intelligible statement of the
restrictions that is useful to the practical use cases.  And nothing
stops users who think they know what they're doing from using them in
other contexts (which can be documented if they turn out to be broadly
useful).

Footnotes: 
[1]  "ASCII coded character semantics" is of course mildly ambiguous
due to considerations like EOL conventions.  But "you know what I'm
talking about".



More information about the Python-Dev mailing list