[Web-SIG] Clarifications on Python 3.0 and WSGI.
Ian Bicking
ianb at colorstudy.com
Tue Mar 25 17:04:27 CET 2008
Phillip J. Eby wrote:
> At 04:54 PM 3/25/2008 +1100, Graham Dumpleton wrote:
>> Why are servers and gateways being made to accept strings when the
>> preference is for applications to produce bytes for both? Is this
>> acknowledgment that getting people to convert WSGI applications to
>> produce bytes may be a problem?
>
> Yep.
>
>> The text of (2) sorts of suggests there is justification for this in
>> saying 'under the existing rules (i.e., s.encode('latin-1') must
>> convert the string to bytes without an exception', but I can't find
>> such a rule in the WSGI PEP when I have a quick look. In other words,
>> where in the existing specification does it say that Unicode strings
>> must be accepted, to the contrary it suggests they can't be and that
>> using them where a string object is expected is undefined.
>
> It says that in versions of Python where 'str is unicode' (i.e.
> Jython, IronPython, and Python 3000), then the specification should
> be read to define "string" as a unicode string whose characters can
> be expressed in latin-1.
>
> Really, adding support for bytes is the stretch here. In fact, I'd
> almost go so far as to say the heck with bytes support except for the
> response body. I could easily consider headers to be text, instead.
Latin-1? How is this supposed to work at all?
For instance, we treat SCRIPT_NAME/PATH_INFO as UTF8 encoded strings.
(QUERY_STRING isn't really UTF8 until you url-decode it, but SCRIPT_NAME
and PATH_INFO are already decoded)
While there are certain HTTP headers where Latin-1 is a reasonable
assumption, as it has been encoded into various specs, I don't generally
see the purpose. Any application that can only speak Latin-1 is broken.
Am I missing something here?
I would prefer using bytes most places in WSGI where str is currently used.
Ian
More information about the Web-SIG
mailing list