[Web-SIG] WSGI key conventions

Ian Bicking ianb at colorstudy.com
Thu May 26 05:04:34 CEST 2005


There's a couple conventions I'd like to suggest, related to URLs.

The first is because some adapters (like mod_scgi) do not set PATH_INFO. 
  I presume this is because, while you typically set the handler for a 
specific Location, it only knows that it handles the complete URL.  So 
when you have:

<Location /app>
   SetHandler scgi-handler
</Location>

The SCGI handler doesn't know that it's only in /app, and that it should 
set SCRIPT_NAME to /app.  Though, I believe mod_webkit *does* do this, 
so maybe it's possible to fix this in mod_scgi.  (However, mod_webkit 
gets really funky when you use it in <Location />, which maybe is 
related.)  FastCGI gets this right too, I think.  But HTTP proxies 
don't.  But you can't add variables to HTTP proxies, though you can add 
headers, but that would be something like HTTP_SCRIPT_NAME...?

Anyway, while this can be configured on the Python side, it would be 
best to keep this together in the Apache configuration (or at least 
there should be the option).  A simple way is to set an environmental 
variable, like:

<Location /app>
   SetEnv WSGI_SCRIPT_NAME "/app"
   SetHandler scgi-handler
</Location>

(I haven't checked the Apache reference, so excuse any misspellings here)

It would be nice if this was a convention, so that if no explicit 
configuration of SCRIPT_NAME is done on the Python side that WSGI 
applications should pick this up (particularly when they are connecting 
to something that is known to not resolve SCRIPT_NAME and PATH_INFO 
properly, and PATH_INFO is missing from the environmental dictionary.)


One other convention that I'd like is for extra variables that are 
parsed out of the request.  For instance, lets say you have domain names 
like username.application.com (using a wildcard *.application.com DNS 
entry).  The WSGI application that parses this might look like:

def parse_username_middleware(app):
     def replacement(environ, start_response):
         host = environ['HTTP_HOST']
         environ.setdefault('url.vars', {})['username'] = host.split('.')[0]
         return app(environ, start_response)
     return replacement

The idea of making 'url.vars' a convention is that different frameworks 
could give access to this specific set of variables (which could be 
derived from the path, hostname, or other data in the environment). 
Some might simply put it in a big pool of variables (GET, POST, Cookie), 
while others might give access to it separately.  But anyway, I've come 
upon this several times with URL-parsing middleware, and I keep just 
stuffing it in random locations, so a convention would be nice.

-- 
Ian Bicking  /  ianb at colorstudy.com  / http://blog.ianbicking.org


More information about the Web-SIG mailing list