[Web-SIG] My experiences implement WSGI on java/j2ee/jython.

Mon Aug 30 04:53:12 CEST 2004

At 01:32 AM 8/30/04 +0100, Alan Kennedy wrote:

>I suppose I'm really pointing out a possible wording difficulty in the 
>spec, which says "may be an empty string, if there is no more appropriate 
>value". To me None is "a more appropriate value" sometimes, so I suppose I 
>could legitimately interpret that to mean that I can use None values in my 
>WSGI-compliant framework, because my server infrastructure allows me to 
>detect their absence or lack of value.
>
>So perhaps either the wording of the spec needs to be tightened up to 
>exclude this? Or the default environment values need to be more clearly 
>specified? Or perhaps a discussion of None vs. empty string needs to added 
>to the Q&A at the end?

I went to add this to the PEP, and found it was already there:

"""Also note that CGI-defined variables must be strings,
if they are present at all.  It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``."""

>So, in terms of compliance with WSGI, am I in violation of the WSGI spec 
>by not transmitting the actual textual status message specified by the 
>application? If that's a problem, there's nothing I can do about it.

Personally, I would just document it as a minor nonconformance of your 
servlet implementation; it's not likely to be an issue in practice.

>It is fundamental requirement (to me at least) that WSGI be able to handle 
>writing of binary data. And I'm fairly sure the intention for the write() 
>callable in WSGI is that it take python "strings", which includes strings 
>of binary data. But perhaps it needs to made explicitly clear in the WSGI 
>spec that the write() callable explicitly writes in binary mode, i.e. that 
>no translation is taking place on byte strings passed to it, and the 
>application/user is responsible for all encoding concerns relating to byte 
>strings passed to the write() callable.

Added a note about this.

>However, even though jython is based on python 2.1, and thus doesn't have 
>built-in support for either iterators or generators, I have still 
>implemented the iterator protocol in my java/jython framework, by simply 
>invoking the .__iter__() and .next() methods on application objects, and 
>catching StopIteration exceptions. So I can support components and 
>applications returning iterators, and I'm thus compliant with the spec, 
>even though I'm running on 2.1. (This is only possible because I'm 
>embedding: it is still not possible to support the iterator protocol in, 
>say, jython for-loops)

Unfortunately, your technique doesn't actually work, unless you're also 
going to patch the Jython __builtins__ to include 'StopIteration', 'iter', 
and so forth.  You would have to use the pre-2.2 iteration protocol, which 
uses __getitem__ and IndexError.  I think this would have to be something 
you document as a spinoff or "application note" for WSGI users who must use 
a pre-2.2 version of Python.  One of the reasons we decided to go ahead and 
require 2.2.2 was to avoid having to deal with the absence of True/False, 
iterators, and generators.

>It's conceivable that even a python 1.5 framework could be programmed to 
>support the iterator protocol: it's *very* easy to implement.

But not actually *usable* in a pre-2.2 Python, because StopIteration 
doesn't exist, so code can't raise it.  If it has to import it from 
somewhere, then it can't be used with multiple WSGI servers or gateways, 
because each one is expecting a different StopIteration class.

>Would it be useful to define a WSGI variable "python.version", similar to 
>"wsgi.version", which gives the python version in effect?

-1; that's what sys.version, sys.hexversion, sys.version_info, and so on 
are for.

>In the J2EE case (and I'm sure with Apache CGI), that's very simple to 
>deal with, since the container will do it's own buffering completely 
>outside your control, and send the pieces with chunked-transfer encoding 
>if necessary. So even if I put a flush on the output channel in my 
>framework, I'm only flushing it to the container's buffer: it's still not 
>guaranteed to send output back down the return socket to the client.

That is potentially a problem, since the point is to guarantee that when 
'write()' returns to the application, the output isn't going to just sit in 
the buffer while the application moves ahead with other things: it should 
be going to the client.

>I see the solution to this redirect platform-dependence problem in the 
>implementation of a platform-independent WSGI middleware component that 
>takes all responsiblity for redirects. This component examines the 
>wsgi.environment present, seeking hints for the optimal way to redirect 
>the request: if mod_python is available, use the mopd_python API call: if 
>modjy is available, use the getDispatcher(uri).redirect() dance, etc. If 
>none of these platform specific techniques are available, it can fall back 
>to sending a 302 or 307 response back to the client, and let the client 
>re-reqeust the new URL.

I'm afraid internal and external redirects are *not* 
interchangeable.  Specifically, internal redirects break relative 
URLs.  So, internal redirects need to be something that's a server 
extension, and *should* be something obscure to do, because you'd better 
know what you're doing.

>8. Write callable and fileno()
>==============================
>
>It is a good idea to check for the fileno() attribute on the write callable,

No, it isn't.  First of all, it's a callable, not a stream, so it won't 
have such an attribute.  Second, even if it *is* the write method of a 
stream, it's none of the application's business.

Perhaps you're confusing this with the part where the server is allowed to 
check whether the application's return value has a fileno()?

>9. Server-detected headers.
>===========================
>
>I can see the reason for servers/containers intercepting client headers 
>and translating/augmenting/deleting them. However, do we need a 
>specification of what to do with certained specified headers? As with CGI, 
>should I recognise the "Status: " header or the "Location: " header, and 
>translate it to the relevant status code, or do a redirect, respectively? 
>If I don't do those translations, won't I be breaking reams of python CGI 
>code out there that relies on Apache doing this?

Again, WSGI doesn't support internal redirects.  The spec as currently 
written doesn't consider "status" to be a header.  Meanwhile, "Location" is 
a valid HTTP header, so there's no issue there.

If you're doing a WSGI implementation, don't worry about CGI.  If the CGI 
code is ported to WSGI, then fixing these issues are part of the port.  If 
the CGI is run under a "WSGI-to-CGI" wrapper, then this is the wrapper's 
responsibility.  In no case is the interpretation of Status or Location 
headers part of the WSGI server's responsibility.

>Which makes we wonder what the "wsgi.errors" variable is for? Yes, it's 
>for writing error data. But what do we expect to happen to data that gets 
>written to it? Will be it wrapped or translated in some way, and and used 
>to construct an error response to the user? Or should it be locally logged 
>by the server?

"""An output stream to which error output can be written.  For most 
servers, this will be the server's error log."""

I've just added some additional explanatory text:

``wsgi.errors``        An output stream to which error output can be
                        written, for the purpose of recording program
                        or other errors in a standardized and possibly
                        centralized location.  For many servers, this
                        will be the server's main error log.

                        Alternatively, this may be ``sys.stderr``, or
                        a log file of  some sort.  The server's
                        documentation should include an explanation of
                        how to configure this or where to find the
                        recorded output.  A server or gateway may
                        supply different error streams to different
                        applications, if this is desired.

>The J2EE ServletContext for each servlet has a "log(message)" method. 
>Maybe I should just send error output there, in which case it will end in 
>the server logs?

That is probably the right place for a servlet-based WSGI gateway to write 
errors to.