[Web-SIG] The rewritten WSGI pre-PEP

Ian Bicking ianb at colorstudy.com
Wed Aug 11 21:20:23 CEST 2004


Phillip J. Eby wrote:
>> I've had constant problems trying to backtrack through middleware 
>> (like mod_rewrite) to figure out how to create a URL that is internal 
>> to the application.  I'd like to keep around some artifact indicating 
>> what the original URI was (e.g., REQUEST_URI); something that 
>> middleware specifically should not rewrite.  Nor is there any real 
>> reason for it to be rewritten.
> 
> 
> Hm.  And SCRIPT_NAME is insufficient for this?  I think I can see why 
> mod_rewrite would make this a problem, but ISTM that Python middleware 
> component could do rewrites that left SCRIPT_NAME "logically correct".

I suppose it could, i.e., http:// + SERVER_NAME + ":" + SERVER_PORT + 
SCRIPT_NAME + PATH_INFO + "?" + QUERY_STRING is the complete URL.  If 
that's the expectation, then that too should be in the spec.  But, if 
only because of the existance of mod_rewrite, that's not likely to be 
true.  REQUEST_URI just seems like a natural part of the request 
description -- it says exactly what the client asked for, without the 
extra meaning that SCRIPT_NAME and PATH_INFO have.

In the end I've come to dislike mod_rewrite because of these issues, but 
  given its existance...

>> SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and 
>> should just be passed through any middleware.
> 
> 
> Are you sure?  SERVER_ADDR might be different if the request is 
> forwarded to another machine, mightn't it?  I seem to recall that 
> mod_backhand does some stuff with this.  In any case it highlights the 
> trouble with trying to precisely pin down things that are already 
> inherently implementation-defined.  Unfortunately, WSGI isn't really 
> going to eliminate all the environment introspecting and munging code 
> that lives in the various existing apps and frameworks today.

If SERVER_ADDR needs to be rewritten, then SERVER_NAME would be 
rewritten at the same time.

I think I've also seen some inconsistencies of SERVER_NAME and 
HTTP_HOST.  SERVER_NAME tends to be the canonical name of the host, 
ignoring any named virtual hosts (at least in Apache).  So really if you 
are going to construct a URL it should use (environ.get("HTTP_HOST") or 
environ.get("SERVER_NAME")).

Maybe it would be good to include how the URL is supposed to be split 
up, at least informationally.  Like, you can reconstruct the URL by doing:

if environ.get('HTTPS') == 'on':
     url = 'https://'
else:
     url = 'http://'
if environ.get('HTTP_HOST'):
     url += environ['HTTP_HOST']
else:
     url += environ['SERVER_NAME']
if environ.get('HTTPS') == 'on':
     if environ['SERVER_PORT'] != '443'
        url += ':' + environ['SERVER_PORT']
else:
     if environ['SERVER_PORT'] != '80':
        url += ':' + environ['SERVER_PORT']
url += environ['SCRIPT_NAME']
url += environ.get('PATH_INFO', '')
if environ.get('QUERY_STRING'):
     url += '?' + environ['QUERY_STRING']


This should never fail (no missing keys), and should always be accurate 
except for details like a ? without a query string, or an explicit port 
that matches the default, or a server may optionally normalize the path.

If it can't be accurate -- e.g., because SCRIPT_NAME or PATH_INFO have 
been muddled (or even QUERY_STRING) -- then I'd like to have a 
REQUEST_URI which is accurate.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org


More information about the Web-SIG mailing list