[Web-SIG] WSGI 2

Graham Dumpleton graham.dumpleton at gmail.com
Wed Aug 5 02:53:14 CEST 2009


2009/8/5 P.J. Eby <pje at telecommunity.com>:
> So what, precisely, are you proposing should happen when such bytes are
> present?

Treat me as a business manager who has read just enough IT magazines
to be dangerous. As I have said before in prior discussions, my area
is C coding and trying to implement a Python hosting solution for
Apache, I do not know all the intricacies of HTTP and web application
development as I don't write web applications. That is why I defer to
you guys to come up with a workable specification. If I don't see
anything sensible coming back in the way of a proposal, I will try an
suggest my own, but because of my lack of knowledge it isn't
necessarily going to be right. In this respect, just pushing it back
on me isn't particularly helpful from my perspective. If you think
something is outright wrong and not going to work, then come back back
with an overall solution which is going to work. So far no one else
has come back with an overall solution that works and everyone is
happy with and I seem to be the only one truly interested in
progressing this. As such it is really frustrating.

Now, the main reason why I am throwing around alternate suggestions in
the first place is that last time although people seem to be
comfortable moving along with the idea of latin-1 everywhere, I knew
of some who weren't happy with that, some not on the list, and who
believed it should be bytes, but they weren't speaking up. In this
discussion people are being more vocal about bytes being the way to go
and I am quite happy with that, we just need to flesh out the various
problems from going that way. So, let us put aside UTF-8 as a workable
solution for Python and focus then on bytes instead. We also need to
address other comments by people about whether status and headers
values in response should come back as bytes or strings to allow
predictability for WSGI middleware.

The questions around use of bytes in my mind are:

1. Should the values of all CGI variables be bytes or just a subset of
them? If a subset, which ones? Note am presuming here the name of
header, ie., key, will be a string and only value will be bytes. Is
that even a correct assumption?

2. How would use of bytes work for a CGI-WSGI bridge given that
os.environ is not bytes? Where does one get what encoding was used for
os.environ values so it can be converted back to bytes?

3. What are the rules about WSGI middleware in respect of preservation
of values as bytes? I can see too easily that people will convert
SCRIPT_NAME and PATH_INFO to string to do stuff with and change them
and then not convert them back to bytes if environ is modified with
new values. The rules would have to be clearly specified.

We then have the issues others have raised about response.

4. Should there be a choice about a WSGI application/middleware
returning bytes or a string which is automatically converted to bytes
per latin-1? If no choice, which should required to be returned, bytes
or strings?

So, lets focus on these issues instead then and any others that people
have in relation to bytes or how responses are returned and so explore
that option.

Graham


More information about the Web-SIG mailing list