[Web-SIG] WSGI 2

Wed Aug 12 00:14:03 CEST 2009

Using bytes for all `environ` values is easy to understand on the
application side as long as you are aware of the encoding problem. The
cost is inconvenience, but that's probably OK. It's also simpler to
implement on the gateway/server side.

By choosing bytes, WSGI passes the encoding problem to the application,
which is good. Let's the application deal with that. It's more likely to
know what it needs, and what problem it can ignore. I think that 99% of
the time, applications will just decode bytes to string using UTF-8,
ignoring invalid values.

However it's likely that we'll see middlewares converting ALL
environment values to UTF-8, because it's more convienient than using
bytes. And some middlewares might depend on `environ` values being
string instead of bytes, because it's convenient too.

This issue was already raised by Graham. And I think it's important to
make it clear. I believe that 'server/CGI' values in the environment
shouldn't be modified--Of course it should still be possible to add new
values. This way the stack will always remain in a 'sane' state.

For example if a middleware wants to convert environ values to UTF-8, it
shouldn't do that:

>   for key, value in environ.items():
>       environ[key] = str(value)

But something like this--assuming there's only bytes in `environ`:

>   environ['unicode.environ'] = dict((key, str(value, encoding='utf8'))
>                                     for key, value in environ.items())

I'm in favor of using bytes everywhere. But it's important to document
why bytes are used and how to use them. I'm not sure this should be
included in a PEP, maybe a "WSGI best practices"?

Cheers,

-- 
  Henry Prêcheur