[Web-SIG] WSGI 2

Henry Precheur henry at precheur.org
Wed Aug 12 02:40:25 CEST 2009


On Wed, Aug 12, 2009 at 09:25:21AM +1000, Graham Dumpleton wrote:
> Use of bytes everywhere can be inconvenient on the gateway/server
> side, at least as far as end result for user.

Yes, but wouldn't it be simpler for mod_wsgi to only deal with bytes?
unicode C strings -> bytes and char* -> bytes conversions seem
straightforward.

But char* -> string doesn't look easy to do, since you have to 'guess'
the encoding.

This is suppositions, I have never worked on WSGI server/gateway.
Correct me if I'm wrong.

> The specific problem is that WSGI environment is used to hold
> information about the original request, as CGI variables, but also can
> hold user specified custom variables.
> 
> In the case of anything hosted via Apache, such as through mod_wsgi,
> mod_fastcgi, mod_fcgid, mod_scgi and mod_cgi(d), users can set such
> custom variables using the SetEnv directive. Thus one might say:
> 
>   SetEnv trac.env_path /usr/local/trac/site-1
> 
> If the rule is that everything in WSGI environment coming from WSGI
> adapter must be bytes then you have a potential for mismatch in
> expectations of how values will be passed. That is, if set using
> SetEnv then would be bytes, but if set using WSGI middleware wrapper
> for configuration, more likely going to be string. It would seem
> overly onerous to expect WSGI middleware to use bytes for
> configuration variables as well and so force all consumers to always
> be converting to string using appropriate encoding, where required
> encoding potentially unknown.

Is it reasonable to expect configuration variable to have a certain
type? I am tempted to say 'no', but that's because I like the "everything
is bytes" approach so much :) I don't have any experience with
configuration variables passed via the WSGI environment though.

But it could be quite a problem, for example 'Developer authentication'
posted a month ago by Ian Bicking requires its configuration variable to
be a string, but I don't think this spec applies to WSGI on Py3K or WSGI
2.

> This is why I specifically asked previously, and which no one has
> answered, if bytes is to be used, which variables in WSGI environment
> should be passed as bytes. If there is a known specified list of
> variables which it is known will always be bytes, may be more
> manageable. If someone is going to suggest that only CGI variables
> should be bytes, then what does that actually mean. Remember that for
> FASTCGI, SCGI, CGI there isn't really a distinction and so where the
> boundary is as to what is a CGI variable is fuzzy although you could
> reverse transformation and get back bytes if know what to do it for.
> 
> One could restrict use of bytes to just SCRIPT_NAME, PATH_INFO and
> QUERY_STRING and maybe that will suffice. It may not though, because
> what about headers such as HTTP_REFERRER? Also, what about additional
> SSL_? variables that a SSL module for web sever may add?

What you are proposing in 'black-listing' some variables known to cause
problems.

It will be difficult to come up with an exhaustive list of variables
with different encoding. Even if we were able to come up with such a
list, it creates 2 different cases and could end up complicate
application developer's life. That's why the approach "everything coming
from the server/gateway is bytes" makes sense, it is simpler to explain,
it is simpler to understand, and it's, I think, more pythonic (There
should be one-- and preferably only one --obvious way to do it.)

Just consider the case of cookies, I don't know if you can use non-ASCII
character in them, but it possible that it will mess up "everything is
string expect a, b, c" if we forget to include it in the list.
"Everything is bytes" is in this sense more future-proof than
"black-listing a, b, c". If a variable with a weird encoding appears a
few month after the new PEP is released, "everything is bytes" still
works, but the "black-list" approach stops working.


Cheers,

-- 
  Henry Prêcheur


More information about the Web-SIG mailing list