[Web-SIG] WSGI 2

Graham Dumpleton graham.dumpleton at gmail.com
Wed Aug 12 02:40:26 CEST 2009


2009/8/12 Ian Bicking <ianb at colorstudy.com>:
> On Tue, Aug 11, 2009 at 6:25 PM, Graham Dumpleton
> <graham.dumpleton at gmail.com> wrote:
>>
>> 2009/8/12 Henry Precheur <henry at precheur.org>:
>> > Using bytes for all `environ` values is easy to understand on the
>> > application side as long as you are aware of the encoding problem. The
>> > cost is inconvenience, but that's probably OK. It's also simpler to
>> > implement on the gateway/server side.
>>
>> Use of bytes everywhere can be inconvenient on the gateway/server
>> side, at least as far as end result for user.
>>
>> The specific problem is that WSGI environment is used to hold
>> information about the original request, as CGI variables, but also can
>> hold user specified custom variables.
>>
>> In the case of anything hosted via Apache, such as through mod_wsgi,
>> mod_fastcgi, mod_fcgid, mod_scgi and mod_cgi(d), users can set such
>> custom variables using the SetEnv directive. Thus one might say:
>>
>>  SetEnv trac.env_path /usr/local/trac/site-1
>
> Just to clarify, there specifically is no type restrictions on extension
> variables, which is any variable with a "." in it.  The type restrictions
> are solely for ALL_CAPS keys.  You can put ints or unicode or whatever in
> other variables.  (Probably this doesn't make things any easier for
> mod_wsgi, though; at least for this example)

If you want to change what the specification says from:

"""Finally, the environ dictionary may also contain server-defined
variables. These variables should be named using only lower-case
letters, numbers, dots, and underscores, and should be prefixed with a
name that is unique to the defining server or gateway."""

to:

"""Finally, the environ dictionary may also contain server-defined
variables. These variables MUST be named using only lower-case
letters, numbers, dots, and underscores, and should be prefixed with a
name that is unique to the defining server or gateway."""

then it is part the way as it least one is drawing a line between what
is being construed as CGI variable and so would be bytes, and
adapter/application variables which would be converted to string in
what ever encoding makes sense for the server configuration system,
with in the case of Apache would be UTF-8.

The above description though would also have to be changed though, in
as much as at the moment it says:

"""should be prefixed with a name that is unique to the defining
server or gateway"""

This isn't really in practice correct as the server configuration is
just providing the mechanism for setting them and they may not
necessarily be server or gateway variables, but variables a user is
setting to customise the behaviour of the application. The way I read
that line, strictly speaking, even though set as:

  SetEnv trac.env_path /usr/local/trac/site-1

it should be passed through as:

  mod_wsgi.trac.env_path

which would be rather silly. Thus description needs to cater for fact
that application variables may be settable from server configuration
and passed through as is.

Anyway, if the rule is that anything in upper case is treated as CGI
and passed as bytes, and anything in lower case isn't and is passed as
string, appropriately decoded, then that would eliminate one confusion
point as far as expectations. It may not make it any easier for CGI
under Python 3.0 though, where values would be all strings anyway.

Now, is anyone willing to address the problem pointed out by others
about where being able to return either bytes or strings (latin-1) for
response headers is a pain for WSGI middleware to deal with?

Graham


More information about the Web-SIG mailing list