[Web-SIG] Request for Comments on upcoming WSGI Changes

Tue Sep 22 16:48:35 CEST 2009

P.J. Eby [mailto:pje at telecommunity.com]
> At 07:40 PM 9/21/2009 -0700, Robert Brewer wrote:
> > Yes; you have to transcode to the "correct" encoding. Once.
> > Then every other WSGI application interface "below" that one
> > doesn't have to care.
> 
> You can only do that if you *break encapsulation*, which as I said
> earlier is voiding the entire point of having a modular interface.

Requiring one component to run before another to achieve a correct
result does not void modularity. Unix pipes employ a modular interface,
but "cat /etc/fstab | wc | head" produces a very different result than
"cat /etc/fstab | head | wc". In such a system, encapsulation requires
that the components not share state, but rather trust that they are
composed correctly (yes, by some "invisible hand") and that the given
input is the intended one, even if that means a previous component
transformed it.

If, on the other hand, only utf-8-decoded strings can be passed as input
to each WSGI component, then each WSGI component must be prepared to
re-decode its inputs; in that case, each must be configured identically
with the same logic to determine the correct decoding, since the correct
decoding does not differ from one component to the next. That repeated
configuration of the correct decoding is shared state, and breaks
encapsulation; one-time transformation of inputs is not and does not.

> Having a configurable encoding just means that *every* WSGI
> application *must* verify the encoding in order to be safe.

No, each can trust its inputs and do its intended job instead, if your
idempotency requirement is relaxed.

> I'm all
> in favor of making everyone suffer equally, but all else being equal,
> I'd prefer them to suffer idempotently rather than conditionally.  ;-)

I know you do, but I don't see the community following your lead in that
preference. Any middleware that alters the environ breaks idempotency.
Any middleware that alters the output breaks idempotency. Most routing
middleware breaks idempotency. There's a lot of all of those already in
the wild.

CherryPy doesn't care, because we marginalized WSGI middleware into near
obscurity. We did that in large part because of the idempotency
requirements of WSGI 1.0. We may have the only routing middleware that
you could mistakenly put in your stack twice and get the same result! So
I'm not fighting for myself/my framework on this; surrogateescape would
work just fine for us since we ship very little middleware.

But I don't think it would work fine for Paste, Pylons, Turbogears,
Repoze, etcetera etcetera who have lots of WSGI middleware to port and
more they want to build, and have been chafing for years now against
this requirement. I believe they want full unicode SCRIPT_NAME and
PATH_INFO, and would prefer a single, new, modular WSGI component be
inserted in their component graphs than to build that logic into every
WSGI component. They already have to deal with correct ordering in their
WSGI component graphs, because they've already abandoned strict
idempotency. Ben, Ian, Mark, Chris, et al, please confirm or deny that;
I could be way off base.

Robert Brewer
fumanchu at aminus.org