[Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.

Ian Bicking ianb at colorstudy.com
Sun Jan 28 20:46:21 CET 2007


Graham Dumpleton wrote:
> In the PEP it says:
> 
> SCRIPT_NAME
>    The initial portion of the request URL's "path" that corresponds
>    to the application object, so that the application knows its virtual
>    "location". This may be an empty string, if the application
>    corresponds to the "root" of the server.
> 
> PATH_INFO
>    The remainder of the request URL's "path", designating the virtual
>    "location" of the request's target within the application. This may
>    be an empty string, if the request URL targets the application root
>    and does not have a trailing slash.
> 
> Seeking further clarification on what happens in certain circumstances,
> paste.lint says:
> 
>    - That SCRIPT_NAME and PATH_INFO are empty or start with /
> 
>    - That at least one of SCRIPT_NAME or PATH_INFO are set.
> 
>    - That SCRIPT_NAME is not '/' (it should be '', and PATH_INFO should
>      be '/').
> 
> As illustration of what this appears to all mean:
> 
>    Mount Point: /application
> 
>    Request URL: /application/something
> 
> yields:
> 
>    SCRIPT_NAME: /application
>    PATH_INFO: /something
> 
> and:
> 
>    Request URL: /application
> 
> yields:
> 
>    SCRIPT_NAME: /application
> 
> with PATH_INFO not needing to actually be defined as it will be empty.

Note that PEP 333 conflicts with the CGI specification here -- the CGI 
specification says that PATH_INFO (and SCRIPT_NAME) must always be 
present even when empty.  Since PEP 333 references the CGI spec, it's a 
bit inconsistent here.

It would be nice if PEP 333 said that PATH_INFO and SCRIPT_NAME SHOULD 
be set, and if wsgiref.validate produced a warning (but not exception) 
when it is missing.

> Now my questions revolve around where an application is mounted
> at a URL which itself has a trailing slash. For example, in Apache  
> one can
> say:
> 
>    <Location /application/>
>    ...
>    </Location>
> 
> If a request arrives which is for '/application', it will not  
> actually be directed
> to the application because it doesn't have the required trailing  
> slash and
> so will not match the path in the directive.

Apache does weird things with Alias too, which as an Apache user drive 
me nuts.  E.g., if you do:

   Alias /foo /path

Then /foobar goes to /pathbar.  Nuts.  But if you do:

   Alias /foo/ /path/

Then /foo doesn't work.  I think we should just avoid this stupid 
behavior and act intelligently with respect to trailing slashes.

> In effect the mount point of the application is '/application/'. One  
> cannot treat
> the mount point as being '/application' as if that is then used by  
> user code
> to reference back to the root of the application for a link or  
> redirect it will not
> actually work as the trailing slash is missing.
> 
> Thus, this would suggest that for this case that one would have:
> 
>    SCRIPT_NAME: /application/
> 
> This though doesn't seem to marry up with WSGI very well. This is  
> because
> reconstruction of URLs indicates that all that is required is to join  
> SCRIPT_NAME
> and PATH_INFO back together. Ie.,
> 
>    from urllib import quote
>    url = environ['wsgi.url_scheme']+'://'
> 
>    if environ.get('HTTP_HOST'):
>        url += environ['HTTP_HOST']
>    else:
>        url += environ['SERVER_NAME']
> 
>        if environ['wsgi.url_scheme'] == 'https':
>            if environ['SERVER_PORT'] != '443':
>               url += ':' + environ['SERVER_PORT']
>        else:
>            if environ['SERVER_PORT'] != '80':
>               url += ':' + environ['SERVER_PORT']
> 
>    url += quote(environ.get('SCRIPT_NAME',''))
>    url += quote(environ.get('PATH_INFO',''))
>    if environ.get('QUERY_STRING'):
>        url += '?' + environ['QUERY_STRING']
> 
> If that is seen as being the case, then we would need to have:
> 
>    Mount Point: /application/
> 
>    Request URL: /application/something
> 
> yields:
> 
>    SCRIPT_NAME: /application/
>    PATH_INFO: something

IMHO it's up to the dispatcher to make sure this sort of thing just 
doesn't happen.  In paste.urlmap I allow a trailing slash to be 
specified in a mount point, but ignore it, preferring instead to enforce 
internal consistency.  This seems to avoid the question you are bringing 
up here. In paste.urlmap when I get a request for '/application' I do 
the redirect in the dispatcher to '/application/', and I don't allow a 
mount point of '/application' to match '/applicationplussome'.

The nice part of this is that when you've coded it in the dispatcher It 
Just Works for people using that dispatcher, and they don't have to 
think about any of these WSGI details ;)


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org


More information about the Web-SIG mailing list