[Python-checkins] r85021 - peps/trunk/pep-0333.txt

Mon Sep 27 01:44:39 CEST 2010

Author: phillip.eby
Date: Mon Sep 27 01:44:39 2010
New Revision: 85021

Log:
Revert Python 3 changes


Modified:
   peps/trunk/pep-0333.txt

Modified: peps/trunk/pep-0333.txt
==============================================================================

--- peps/trunk/pep-0333.txt	(original)
+++ peps/trunk/pep-0333.txt	Mon Sep 27 01:44:39 2010
@@ -142,51 +142,6 @@
 introspected upon.
 
 
-A Note On String Types
-----------------------
-
-In general, HTTP deals with bytes, which means that this specification
-is mostly about handling bytes.
-
-However, the content of those bytes often has some kind of textual
-interpretation, and in Python, strings are the most convenient way
-to handle text.
-
-But in many Python versions and implementations, strings are Unicode,
-rather than bytes.  This requires a careful balance between a usable
-API and correct translations between bytes and text in the context of
-HTTP...  especially to support porting code between Python
-implementations with different ``str`` types.
-
-WSGI therefore defines two kinds of "string":
-
-* "Native" strings (which are always implemented using the type
-  named ``str``) that are used for request/response headers and
-  metadata
-
-* "Bytestrings" (which are implemented using the ``bytes`` type
-  in Python 3, and ``str`` elsewhere), that are used for the bodies
-  of requests and responses (e.g. POST/PUT input data and HTML page
-  outputs).
-
-Do not be confused however: even if Python's ``str`` type is actually 
-Unicode "under the hood", the *content* of native strings must
-still be translatable to bytes via the Latin-1 encoding!  (See
-the section on `Unicode Issues`_ later in  this document for more
-details.)
-
-In short: where you see the word "string" in this document, it refers 
-to a "native" string, i.e., an object of type ``str``, whether it is 
-internally implemented as bytes or unicode.  Where you see references 
-to "bytestring", this should be read as "an object of type ``bytes`` 
-under Python 3, or type ``str`` under Python 2".
-
-And so, even though HTTP is in some sense "really just bytes", there
-are  many API conveniences to be had by using whatever Python's
-default  ``str`` type is.
-
-
-
 The Application/Framework Side
 ------------------------------
 
@@ -209,15 +164,13 @@
 Here are two example application objects; one is a function, and the
 other is a class::
 
-    # this would need to be a byte string in Python 3:
-    HELLO_WORLD = "Hello world!\n"  
-
     def simple_app(environ, start_response):
         """Simplest possible application object"""
         status = '200 OK'
         response_headers = [('Content-type', 'text/plain')]
         start_response(status, response_headers)
-        return [HELLO_WORLD]
+        return ['Hello world!\n']
+
 
     class AppClass:
         """Produce the same output, but using a class
@@ -242,7 +195,7 @@
             status = '200 OK'
             response_headers = [('Content-type', 'text/plain')]
             self.start(status, response_headers)
-            yield HELLO_WORLD
+            yield "Hello world!\n"
 
 
 The Server/Gateway Side
@@ -290,7 +243,7 @@
                      sys.stdout.write('%s: %s\r\n' % header)
                  sys.stdout.write('\r\n')
 
-            sys.stdout.write(data)  # TODO: this needs to be binary on Py3
+            sys.stdout.write(data)
             sys.stdout.flush()
 
         def start_response(status, response_headers, exc_info=None):
@@ -373,7 +326,7 @@
         """Transform iterated output to piglatin, if it's okay to do so
 
         Note that the "okayness" can change until the application yields
-        its first non-empty bytestring, so 'transform_ok' has to be a mutable
+        its first non-empty string, so 'transform_ok' has to be a mutable
         truth value.
         """
 
@@ -388,7 +341,7 @@
 
         def next(self):
             if self.transform_ok:
-                return piglatin(self._next())   # call must be byte-safe on Py3
+                return piglatin(self._next())
             else:
                 return self._next()
 
@@ -423,7 +376,7 @@
 
                 if transform_ok:
                     def write_latin(data):
-                        write(piglatin(data))   # call must be byte-safe on Py3
+                        write(piglatin(data))
                     return write_latin
                 else:
                     return write
@@ -473,7 +426,7 @@
 attempting to display an error message to the browser.
 
 The ``start_response`` callable must return a ``write(body_data)``
-callable that takes one positional parameter: a bytestring to be written
+callable that takes one positional parameter: a string to be written
 as part of the HTTP response body.  (Note: the ``write()`` callable is
 provided only to support certain existing frameworks' imperative output
 APIs; it should not be used by new applications or frameworks if it
@@ -481,24 +434,24 @@
 details.)
 
 When called by the server, the application object must return an
-iterable yielding zero or more bytestrings.  This can be accomplished in a
-variety of ways, such as by returning a list of bytestrings, or by the
-application being a generator function that yields bytestrings, or
+iterable yielding zero or more strings.  This can be accomplished in a
+variety of ways, such as by returning a list of strings, or by the
+application being a generator function that yields strings, or
 by the application being a class whose instances are iterable.
 Regardless of how it is accomplished, the application object must
-always return an iterable yielding zero or more bytestrings.
+always return an iterable yielding zero or more strings.
 
-The server or gateway must transmit the yielded bytestrings to the client
-in an unbuffered fashion, completing the transmission of each bytestring
+The server or gateway must transmit the yielded strings to the client
+in an unbuffered fashion, completing the transmission of each string
 before requesting another one.  (In other words, applications
 **should** perform their own buffering.  See the `Buffering and
 Streaming`_ section below for more on how application output must be
 handled.)
 
-The server or gateway should treat the yielded bytestrings as binary byte
+The server or gateway should treat the yielded strings as binary byte
 sequences: in particular, it should ensure that line endings are
 not altered.  The application is responsible for ensuring that the
-bytestring(s) to be written are in a format suitable for the client.  (The
+string(s) to be written are in a format suitable for the client.  (The
 server or gateway **may** apply HTTP transfer encodings, or perform
 other transformations for the purpose of implementing HTTP features
 such as byte-range transmission.  See `Other HTTP Features`_, below,
@@ -519,7 +472,7 @@
 generator support, and other common iterables with ``close()`` methods.
 
 (Note: the application **must** invoke the ``start_response()``
-callable before the iterable yields its first body bytestring, so that the
+callable before the iterable yields its first body string, so that the
 server can send the headers before any body content.  However, this
 invocation **may** be performed by the iterable's first iteration, so
 servers **must not** assume that ``start_response()`` has been called
@@ -612,7 +565,7 @@
 
 Note: missing variables (such as ``REMOTE_USER`` when no
 authentication has occurred) should be left out of the ``environ``
-dictionary.  Also note that CGI-defined variables must be native strings,
+dictionary.  Also note that CGI-defined variables must be strings,
 if they are present at all.  It is a violation of this specification
 for a CGI variable's value to be of any type other than ``str``.
 
@@ -632,9 +585,9 @@
                        ``"http"`` or ``"https"``, as appropriate.
 
 ``wsgi.input``         An input stream (file-like object) from which
-                       the HTTP request body bytes can be read.  (The
-                       server or gateway may perform reads on-demand
-                       as requested by the application, or it may pre-
+                       the HTTP request body can be read.  (The server
+                       or gateway may perform reads on-demand as
+                       requested by the application, or it may pre-
                        read the client's request body and buffer it
                        in-memory or on disk, or use any other
                        technique for providing such an input stream,
@@ -649,12 +602,6 @@
                        ending, and assume that it will be converted to
                        the correct line ending by the server/gateway.
 
-                       (On platforms where the ``str`` type is unicode,
-                       the error stream **should** accept and log
-                       arbitary unicode without raising an error; it
-                       is allowed, however, to substitute characters
-                       that cannot be rendered in the stream's encoding.)
-
                        For many servers, ``wsgi.errors`` will be the
                        server's main error log. Alternatively, this
                        may be ``sys.stderr``, or a log file of some
@@ -798,7 +745,7 @@
 The ``start_response`` callable **must not** actually transmit the
 response headers.  Instead, it must store them for the server or
 gateway to transmit **only** after the first iteration of the
-application return value that yields a non-empty bytestring, or upon
+application return value that yields a non-empty string, or upon
 the application's first invocation of the ``write()`` callable.  In
 other words, response headers must not be sent until there is actual
 body data available, or until the application's returned iterable is
@@ -873,12 +820,12 @@
 avoid the need to close the client connection.  If the application
 does *not* call the ``write()`` callable, and returns an iterable
 whose ``len()`` is 1, then the server can automatically determine
-``Content-Length`` by taking the length of the first bytestring yielded
+``Content-Length`` by taking the length of the first string yielded
 by the iterable.
 
 And, if the server and client both support HTTP/1.1 "chunked
 encoding" [3]_, then the server **may** use chunked encoding to send
-a chunk for each ``write()`` call or bytestring yielded by the iterable,
+a chunk for each ``write()`` call or string yielded by the iterable,
 thus generating a ``Content-Length`` header for each chunk.  This
 allows the server to keep the client connection alive, if it wishes
 to do so.  Note that the server **must** comply fully with RFC 2616
@@ -903,7 +850,7 @@
 
 The corresponding approach in WSGI is for the application to simply
 return a single-element iterable (such as a list) containing the
-response body as a single bytestring.  This is the recommended approach
+response body as a single string.  This is the recommended approach
 for the vast majority of application functions, that render
 HTML pages whose text easily fits in memory.
 
@@ -952,12 +899,12 @@
 middleware components **must not** block iteration waiting for
 multiple values from an application iterable.  If the middleware
 needs to accumulate more data from the application before it can
-produce any output, it **must** yield an empty bytestring.
+produce any output, it **must** yield an empty string.
 
 To put this requirement another way, a middleware component **must
 yield at least one value** each time its underlying application
 yields a value.  If the middleware cannot yield any other value,
-it must yield an empty bytestring.
+it must yield an empty string.
 
 This requirement ensures that asynchronous applications and servers
 can conspire to reduce the number of threads that are required
@@ -999,22 +946,22 @@
 potentially providing better throughput for the server as a whole.
 
 The ``write()`` callable is returned by the ``start_response()``
-callable, and it accepts a single parameter:  a bytestring to be
+callable, and it accepts a single parameter:  a string to be
 written as part of the HTTP response body, that is treated exactly
 as though it had been yielded by the output iterable.  In other
 words, before ``write()`` returns, it must guarantee that the
-passed-in bytestring was either completely sent to the client, or
+passed-in string was either completely sent to the client, or
 that it is buffered for transmission while the application
 proceeds onward.
 
 An application **must** return an iterable object, even if it
 uses ``write()`` to produce all or part of its response body.
 The returned iterable **may** be empty (i.e. yield no non-empty
-bytestrings), but if it *does* yield non-empty bytestrings, that output
+strings), but if it *does* yield non-empty strings, that output
 must be treated normally by the server or gateway (i.e., it must be
 sent or queued immediately).  Applications **must not** invoke
 ``write()`` from within their return iterable, and therefore any
-bytestrings yielded by the iterable are transmitted after all bytestrings
+strings yielded by the iterable are transmitted after all strings
 passed to ``write()`` have been sent to the client.
 
 
@@ -1023,9 +970,9 @@
 
 HTTP does not directly support Unicode, and neither does this
 interface.  All encoding/decoding must be handled by the application;
-all strings passed to or from the server must be of type ``str`` or
-``bytes``, never ``unicode``.  The result of using a ``unicode``
-object where a string object is required, is undefined.
+all strings passed to or from the server must be standard Python byte
+strings, not Unicode objects.  The result of using a Unicode object
+where a string object is required, is undefined.
 
 Note also that strings passed to ``start_response()`` as a status or
 as response headers **must** follow RFC 2616 with respect to encoding.
@@ -1033,7 +980,7 @@
 MIME encoding.
 
 On Python platforms where the ``str`` or ``StringType`` type is in
-fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all
+fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all
 "strings" referred to in this specification must contain only
 code points representable in ISO-8859-1 encoding (``\u0000`` through
 ``\u00FF``, inclusive).  It is a fatal error for an application to
@@ -1041,18 +988,12 @@
 Similarly, servers and gateways **must not** supply
 strings to an application containing any other Unicode characters.
 
-Again, all objects referred to in this specification as "strings"
-**must** be of type ``str`` or ``StringType``, and **must not** be
-of type ``unicode`` or ``UnicodeType``.  And, even if a given platform
-allows for more than 8 bits per character in ``str``/``StringType``
-objects, only the lower 8 bits may be used, for any value referred
-to in this specification as a "string".
-
-For values referred to in this specification as "bytestrings"
-(i.e., values read from ``wsgi.input``, passed to ``write()``
-or yielded by the application), the value **must** be of type
-``bytes`` under Python 3, and ``str`` in earlier versions of
-Python.
+Again, all strings referred to in this specification **must** be
+of type ``str`` or ``StringType``, and **must not** be of type
+``unicode`` or ``UnicodeType``.  And, even if a given platform allows
+for more than 8 bits per character in ``str``/``StringType`` objects,
+only the lower 8 bits may be used, for any value referred to in
+this specification as a "string".
 
 
 Error Handling
@@ -1507,7 +1448,7 @@
    ``environ`` dictionary.  This is the recommended approach for
    offering any such value-added services.
 
-2. Why can you call ``write()`` *and* yield bytestrings/return an
+2. Why can you call ``write()`` *and* yield strings/return an
    iterable?  Shouldn't we pick just one way?
 
    If we supported only the iteration approach, then current