[Web-SIG] Clarifications on Python 3.0 and WSGI.

Tue Mar 25 17:04:27 CET 2008

Phillip J. Eby wrote:
> At 04:54 PM 3/25/2008 +1100, Graham Dumpleton wrote:
>> Why are servers and gateways being made to accept strings when the
>> preference is for applications to produce bytes for both? Is this
>> acknowledgment that getting people to convert WSGI applications to
>> produce bytes may be a problem?
> 
> Yep.
> 
>> The text of (2) sorts of suggests there is justification for this in
>> saying 'under the existing rules (i.e., s.encode('latin-1') must
>> convert the string to bytes without an exception', but I can't find
>> such a rule in the WSGI PEP when I have a quick look. In other words,
>> where in the existing specification does it say that Unicode strings
>> must be accepted, to the contrary it suggests they can't be and that
>> using them where a string object is expected is undefined.
> 
> It says that in versions of Python where 'str is unicode' (i.e. 
> Jython, IronPython, and Python 3000), then the specification should 
> be read to define "string" as a unicode string whose characters can 
> be expressed in latin-1.
> 
> Really, adding support for bytes is the stretch here.  In fact, I'd 
> almost go so far as to say the heck with bytes support except for the 
> response body.  I could easily consider headers to be text, instead.

Latin-1?  How is this supposed to work at all?

For instance, we treat SCRIPT_NAME/PATH_INFO as UTF8 encoded strings. 
(QUERY_STRING isn't really UTF8 until you url-decode it, but SCRIPT_NAME 
and PATH_INFO are already decoded)

While there are certain HTTP headers where Latin-1 is a reasonable 
assumption, as it has been encoded into various specs, I don't generally 
see the purpose.  Any application that can only speak Latin-1 is broken. 
  Am I missing something here?

I would prefer using bytes most places in WSGI where str is currently used.

   Ian