From me at essienitaessien.com  Thu Jan  4 21:43:09 2007
From: me at essienitaessien.com (Essien Ita Essien)
Date: Thu, 04 Jan 2007 21:43:09 +0100
Subject: [Web-SIG] simpleweb 0.7.2 in cheeseshop
Message-ID: <459D66DD.2000904@essienitaessien.com>

Hi Everybody,

I'm please to annouce that I've just uploaded simpleweb-0.7.2 to cheeseshop.

What Is simpleweb
===================

simpleweb is simple python wsgi compliant web framework, inspired by 
Django, Turbo Gears and Web.py

simpleweb is a result of working closely with web.py, TurboGears, 
Django, and a very hefty dose of opinionation(tm) :)

Like TurboGears, it builds on existing python and wsgi components, to 
keep things simple, and just connects these components in a very easy 
transparent way.

Like web.py its dispatching mechanism is matched strictly to HTTP methods.

http://simpleweb.essienitaessien.com for details.

Major Changes In This Version
=============================

1. Dependencies reduction (simpleweb now has ONLY ONE non-optional 
dependency.

2. create-movable plugin. This will make the project folder self 
sufficient, and is able to be deployed WITHOUT installing any libraries 
on the target system. Not even simpleweb needs to be installed on the 
target machine!!!!

The website http://simpleweb.essienitaessien.com has been updated to 
reflect the changes.

With this new release, the recommended plot is:
	1 - Install simpleweb and its dependencies on development machine where 
developer probably controls python version, and library environment/site

	2 - When complete and ready for deployment, use simpleweb-admin 
create-movable, to create a self contained deployment folder. Deploy 
this self contained folder on the target hosting environment. This way, 
various versions of applications can be deployed with various versions 
of simpleweb.

	If you control the hosting environment, you can also choose to install 
simpleweb and dependencies system wide, and deploy normally. I'm now 
deploying the web app on http://simpleweb.essienitaessien.com this way.

I'll really appreciate tests and feedback. I'm not rushing to add more 
features for now, but will spend time to make these more robust.


ChangeLog 0.7.2 - Dependency Reduction Branch
======================================================
	* Add 'create-movable' plugin to simpleweb-admin
	* Initial work on dependency reduction
		* Introduced simpleweb.extlib [resolver, selector, yaro, memento, static]
		* All references to the above libraries are now internal
	* Made 'flup' an optional dependency
		* Dev server will warn of sessions not being available. If
		config.enable_sessions is set
		* FCGI deployment will send a msg to stderr, on non flup.servers.fcgi 
availability.
	* Made sqlobject, sqlalchemy and Cheetah optional dependencies 
attempting to enable
	them as plugins in config.py will cause an error alerting that they can't
	be used unless installed.
	* simpleweb.utils.optional_dependecies_err() -> to consistenly report
	issues
	* oh! modified the internal server banner again :)
	* utils.from_import() now properly raises exceptions instead of calling
	sys.exit()
	* simpleweb no more crashes if a controller specified in urls.py doesn't
	exist. It sends a warning to stderr instead. If an attempt is made to
	access that url, an HTTP 404 is raised.
	* unify error/info/warn reporting accross simpleweb with
	simpleweb.utils.msg_[err | warn | info]
	* Add SimpleErrorMiddleware, and use that to handle errors in
	SimplewebReloadingApp().
	* Correct the simpleweb-admin 'help' plugin to properly
		print help for create-movable plugin. Also, set
		ground-work for converting badly named plugins (e.g.
		createproject should be renamed to create-project, etc)

From foobarbazbaz at yahoo.com  Mon Jan  8 17:38:19 2007
From: foobarbazbaz at yahoo.com (Foobar BazBaz)
Date: Mon, 8 Jan 2007 08:38:19 -0800 (PST)
Subject: [Web-SIG] WSGI multi threading indications?
Message-ID: <231745.7626.qm@web58807.mail.re1.yahoo.com>

I'm using wsgiref.simple_server running behind Apache.
(Created using wsgiref.simple_server.make_server)

I see:
  wsgi.multiprocess is False
  wsgi.multithread is True
  wsgi.run_once	 is False

I'm surprised by the value of multithread, since it
appears
(and looking at the code seems to verify) that
additional
threads are never created; i.e. the server
synchronously
handles one request at a time.

Am I missing something?  It there a better choice for
an out-of-the box server to work in an WSGI
environment?

Thanks,
Steve


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From pje at telecommunity.com  Mon Jan  8 18:12:57 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 08 Jan 2007 12:12:57 -0500
Subject: [Web-SIG] WSGI multi threading indications?
In-Reply-To: <231745.7626.qm@web58807.mail.re1.yahoo.com>
Message-ID: <5.1.1.6.0.20070108120917.04b85cd8@sparrow.telecommunity.com>

At 08:38 AM 1/8/2007 -0800, Foobar BazBaz wrote:
>I'm using wsgiref.simple_server running behind Apache.
>(Created using wsgiref.simple_server.make_server)
>
>I see:
>   wsgi.multiprocess is False
>   wsgi.multithread is True
>   wsgi.run_once is False
>
>I'm surprised by the value of multithread, since it
>appears
>(and looking at the code seems to verify) that
>additional
>threads are never created; i.e. the server
>synchronously
>handles one request at a time.
>
>Am I missing something?

The simple_server never creates multiple threads, but it's potentially 
multi-threadable.  It probably shouldn't be saying multithread unless it 
knows it is.  The WSGIRequestHandler.handle() method is the culprit; it 
should probably be checking self.server in some way and passing an 
appropriate multithread flag to the newly created ServerHandler.


From janssen at parc.com  Mon Jan  8 18:34:47 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 8 Jan 2007 09:34:47 PST
Subject: [Web-SIG] cleaning up the standard library's Web support
Message-ID: <07Jan8.093450pst."58648"@synergy1.parc.xerox.com>

There's a thread going on on the Python-3000 list about PEP 3108,
which proposes a clean-up/re-org for the standard library.

I've suggested that, analogous to the "email" package, a "web" package
be created, and most (all?) of the web-related modules be moved under
it.  This is also a chance to remove cruft and combine related modules
(urllib.py and urllib2.py, for instance).

To take another example, should BaseHTTPServer and SimpleHTTPServer
both exist?  Shouldn't SimpleHTTPServer.SimpleHTTPRequestHandler just
be another class defined in BaseHTTPServer?  Or should
SimpleHTTPServer just be deleted altogether?

Bill

From fumanchu at amor.org  Mon Jan  8 19:08:22 2007
From: fumanchu at amor.org (Robert Brewer)
Date: Mon, 8 Jan 2007 10:08:22 -0800
Subject: [Web-SIG] WSGI multi threading indications?
References: <231745.7626.qm@web58807.mail.re1.yahoo.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224D06@ex9.hostedexchange.local>

Foobar BazBaz wrote:
> I'm using wsgiref.simple_server running behind Apache.
> (Created using wsgiref.simple_server.make_server)
> 
> I see:
>   wsgi.multiprocess is False
>   wsgi.multithread is True
>   wsgi.run_once is False
> 
> I'm surprised by the value of multithread, since it
> appears (and looking at the code seems to verify) that
> additional threads are never created; i.e. the server
> synchronously handles one request at a time.
> 
> Am I missing something?  It there a better choice for
> an out-of-the box server to work in an WSGI
> environment?

There's no better choice for that particular reason. ;) You'll have to manually tell any WSGI server what environment it's running in, because mod_proxy/mod_rewrite doesn't include that metadata by default. There's probably a way to send a custom header from Apache up to the WSGI server, but that would be by convention only (at this point).

If you use mod_python (3.1 or better) instead of proxy/rewrite, you can inspect apache.mpm_query(apache.AP_MPMQ_IS_THREADED) and apache.mpm_query(apache.AP_MPMQ_IS_FORKED) as http://projects.amor.org/misc/wiki/ModPythonGateway does.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20070108/9d7af2fa/attachment.htm 

From pje at telecommunity.com  Mon Jan  8 19:53:42 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 08 Jan 2007 13:53:42 -0500
Subject: [Web-SIG] WSGI multi threading indications?
In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224D06@ex9.hostedexchange. local>
References: <231745.7626.qm@web58807.mail.re1.yahoo.com>
Message-ID: <5.1.1.6.0.20070108135154.0289e5a0@sparrow.telecommunity.com>

At 10:08 AM 1/8/2007 -0800, Robert Brewer wrote:
>You'll have to manually tell any WSGI server what environment it's running 
>in, because mod_proxy/mod_rewrite doesn't include that metadata by 
>default. There's probably a way to send a custom header from Apache up to 
>the WSGI server, but that would be by convention only (at this point).

Actually, the original poster's question is with respect to a standalone 
server program, so Apache would have no way to know what mode it (the 
standalone Python program) is running in!


>If you use mod_python (3.1 or better) instead of proxy/rewrite, you can 
>inspect apache.mpm_query(apache.AP_MPMQ_IS_THREADED) and 
>apache.mpm_query(apache.AP_MPMQ_IS_FORKED) as 
><http://projects.amor.org/misc/wiki/ModPythonGateway>http://projects.amor.org/misc/wiki/ModPythonGateway 
>does.

Not in the OP's case, since he's using a standalone program.  It doesn't 
matter what Apache's threading model is, he needs the threading model of 
*his* program.  :)


From foobarbazbaz at yahoo.com  Mon Jan  8 21:54:13 2007
From: foobarbazbaz at yahoo.com (Foobar BazBaz)
Date: Mon, 8 Jan 2007 12:54:13 -0800 (PST)
Subject: [Web-SIG] WSGI multi threading indications?
In-Reply-To: <5.1.1.6.0.20070108120917.04b85cd8@sparrow.telecommunity.com>
Message-ID: <20070108205413.14031.qmail@web58807.mail.re1.yahoo.com>


--- "Phillip J. Eby" <pje at telecommunity.com> wrote:

> At 08:38 AM 1/8/2007 -0800, Foobar BazBaz wrote:
> >I'm using wsgiref.simple_server running behind
> Apache.
> >(Created using wsgiref.simple_server.make_server)
> >
> >I see:
> >   wsgi.multiprocess is False
> >   wsgi.multithread is True
> >   wsgi.run_once is False
> >
> >I'm surprised by the value of multithread, since it
> >appears
> >(and looking at the code seems to verify) that
> >additional
> >threads are never created; i.e. the server
> >synchronously
> >handles one request at a time.
> >
> >Am I missing something?
> 
> The simple_server never creates multiple threads,
> but it's potentially 
> multi-threadable.  It probably shouldn't be saying
> multithread unless it 
> knows it is.  The WSGIRequestHandler.handle() method
> is the culprit; it 
> should probably be checking self.server in some way
> and passing an 
> appropriate multithread flag to the newly created
> ServerHandler.
> 
> 

Phillip,

Yes... that's it exactly.  Looking at the code, it
appears
that subclassing WSGIRequestHandler, overriding the
handle
method to dispatch to one of a pool of handler threads
might be an easy way to get reasonalble performance
from this still-simple approach.

At least it *looks* to me like handle method would be
the place to make the transfermation to
thread-per-request.
Comment?

Thanks,
Steve 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From pje at telecommunity.com  Mon Jan  8 23:02:08 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 08 Jan 2007 17:02:08 -0500
Subject: [Web-SIG] WSGI multi threading indications?
In-Reply-To: <20070108205413.14031.qmail@web58807.mail.re1.yahoo.com>
References: <5.1.1.6.0.20070108120917.04b85cd8@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20070108165853.028800d8@sparrow.telecommunity.com>

At 12:54 PM 1/8/2007 -0800, Foobar BazBaz wrote:
>Yes... that's it exactly.  Looking at the code, it
>appears
>that subclassing WSGIRequestHandler, overriding the
>handle
>method to dispatch to one of a pool of handler threads
>might be an easy way to get reasonalble performance
>from this still-simple approach.
>
>At least it *looks* to me like handle method would be
>the place to make the transfermation to
>thread-per-request.

No, that's the place where you'd change it to set wsgi.multithread to 
False.  :)

If you want to make it *actually* multithreaded, you need to subclass 
WSGIServer and mix in SocketServer.ThreadingMixIn, e.g.:

    class MultiWSGI(ThreadingMixIn, WSGIServer):
        ...


From me at essienitaessien.com  Fri Jan 12 18:23:26 2007
From: me at essienitaessien.com (Essien Ita Essien)
Date: Fri, 12 Jan 2007 18:23:26 +0100
Subject: [Web-SIG] simpleweb 0.7.3 in cheeseshop
Message-ID: <45A7C40E.6070009@essienitaessien.com>

Hi Everybody,

I'm please to annouce that I've just uploaded simpleweb-0.7.3 to cheeseshop.

What Is simpleweb
===================

simpleweb is simple python wsgi compliant web framework, inspired by
Django, Turbo Gears and Web.py

simpleweb is a result of working closely with web.py, TurboGears,
Django, and a very hefty dose of opinionation(tm) :)

Like TurboGears, it builds on existing python and wsgi components, to
keep things simple, and just connects these components in a very easy
transparent way.

Like web.py its dispatching mechanism is matched strictly to HTTP methods.

http://simpleweb.essienitaessien.com for details.

Major Changes In This Version
=============================

Plugin command readability and consistency and code fixups.


ChangeLog:
=========
http://simpleweb-py.googlecode.com/svn/tags/0.7.3/src/simpleweb/ChangeLog

I'll really appreciate tests and feedback. I'm not rushing to add more
features for now, but will spend time to make these more robust.

Patches are welcome.

Cheers,
Essien


From grahamd at dscpl.com.au  Mon Jan 15 00:22:07 2007
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Sun, 14 Jan 2007 18:22:07 -0500
Subject: [Web-SIG] WSGI input filter that changes content length.
Message-ID: <1168816926.29590@dscpl.user.openhosting.com>

How does one implement in WSGI an input filter that manipulates the request
body in such a way that the effective content length would be changed?

In the WSGI PEP it says:

  CONTENT_LENGTH
    The contents of any Content-Length fields in the HTTP request. May be
    empty or absent.

Also, it says:

  The server is not required to read past the client's specified Content-
  Length, and is allowed to simulate an end-of-file condition if the
  application attempts to read past that point. The application should not
  attempt to read more data than is specified by the CONTENT_LENGTH
  variable.

Is the absence of the CONTENT_LENGTH meant to imply that the content length is
actually 0, ie., no content, or is it allowed to indicate that the application
should perform a read() with no argument to get all data that may be present
and from the data returned imply the actual content length?

The problem I am trying to address here is how one might implement using WSGI a
decompression filter for the body of a request. Ie., where "Content-Encoding:
gzip" has been specified.

In this situation when start_response() for the middleware is called, it will
know that the content length is likely to change but not what the new content
length will actually be. As a consequence, the only thing it can really do at
that point is zap the CONTENT_LENGTH to indicate that the value can't actually
be trusted.

The only other option would be to have at the start_response() phase the
middleware actually read the data in, decompress it and buffer it. Having done
this it will know what the new content length value would be and could change
CONTENT_LENGTH before calling start_response() on the downstream application.
Doing this has various downsides though. The first is that the read can trigger
a 100 continue to be sent back to the client if HTTP/1.1 is used before the
real consumer application is ready to start using the data. The application may
eventually decide though before even attempting to consume the data that it
wants to reject the request, but at that point is too late in as much as the
data has already been consumed by the middleware with the client unnecessarily
having sent the data. The other downside is the need to buffer the data. If it
is a small amount of data then in memory buffering may suffice, but if it huge
then disk based caching would be necessary.

So, how is one meant to deal with this in WSGI?

Graham

From pywebsig at alan.kennedy.name  Mon Jan 15 11:56:37 2007
From: pywebsig at alan.kennedy.name (Alan Kennedy)
Date: Mon, 15 Jan 2007 10:56:37 +0000
Subject: [Web-SIG] WSGI input filter that changes content length.
In-Reply-To: <1168816926.29590@dscpl.user.openhosting.com>
References: <1168816926.29590@dscpl.user.openhosting.com>
Message-ID: <4a951aa00701150256j4b534793m86759c26a45f06e7@mail.gmail.com>

[Graham Dumpleton]
> How does one implement in WSGI an input filter that manipulates the request
> body in such a way that the effective content length would be changed?

> The problem I am trying to address here is how one might implement using WSGI a
> decompression filter for the body of a request. Ie., where "Content-Encoding:
> gzip" has been specified.

> So, how is one meant to deal with this in WSGI?

The usual approach to modifying something something in the WSGI
environment, in this case the wsgi.input file-like object, is to wrap
it or replace it with an object that behaves as desired.

In this case, the approach I would take would be to wrap the
wsgi.input object with a gzip.GzipFile object, which should only read
the input stream data on demand. The code would look like this

import gzip
wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input'])

Notes.

1. The application should be completely unaware that it is dealing
with a compressed stream: it simply reads from wsgi.input, unaware
that reading from what it thinks the input stream is actually causing
cascading reads down a series of file-like objects.

2. The GzipFile object will decompress on the fly, meaning that it
will only read from the wrapped input stream when it needs input.
Which means that if the application does not read data from
wsgi.input, then no data will be read from the client connection.

3. The GzipFile should not be responsible for enforcement of the
incoming Content-Length boundary. Instead, this should be enforced by
the original server-provided file-like input stream that it wraps. So
if the application attempts to read past Content-Length bytes, the
server-provided input stream "is allowed to simulate an end-of-file
condition". Which would cause the GzipFile to return an EOF to the
application, or possibly an exception.

4. Because of the on-the-fly nature of the GzipFile decompression, it
would not be possible to provide a meaningful Content-Length value to
the application. To do so would require buffering and decompressing
the entire input data stream. But the application should still be able
to operate without knowing Content-Length.

5. The wrapping can NOT be done in middleware. PEP 333, Section "Other
HTTP Features" has this to say: "WSGI applications must not generate
any "hop-by-hop" headers [4], attempt to use HTTP features that would
require them to generate such headers, or rely on the content of any
incoming "hop-by-hop" headers in the environ dictionary. WSGI servers
must handle any supported inbound "hop-by-hop" headers on their own,
such as by decoding any inbound Transfer-Encoding, including chunked
encoding if applicable." So the wrapping and replacement of wsgi.input
should happen in the server or gateway, NOT in middleware.

6. Exactly the same principles should apply to decoding incoming
Transfer-Encoding: chunked.

HTH,

Alan.

P.S. Thanks for all your great work on mod_python Graham!

From grahamd at dscpl.com.au  Mon Jan 15 12:49:49 2007
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Mon, 15 Jan 2007 06:49:49 -0500
Subject: [Web-SIG] WSGI input filter that changes content length.
Message-ID: <1168861789.27614@dscpl.user.openhosting.com>

Alan Kennedy wrote ..
> [Graham Dumpleton]
> > How does one implement in WSGI an input filter that manipulates the request
> > body in such a way that the effective content length would be changed?
> 
> > The problem I am trying to address here is how one might implement using
> WSGI a
> > decompression filter for the body of a request. Ie., where "Content-Encoding:
> > gzip" has been specified.
> 
> > So, how is one meant to deal with this in WSGI?
> 
> The usual approach to modifying something something in the WSGI
> environment, in this case the wsgi.input file-like object, is to wrap
> it or replace it with an object that behaves as desired.
> 
> In this case, the approach I would take would be to wrap the
> wsgi.input object with a gzip.GzipFile object, which should only read
> the input stream data on demand. The code would look like this
> 
> import gzip
> wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input'])
> 
> Notes.
> 
> 1. The application should be completely unaware that it is dealing
> with a compressed stream: it simply reads from wsgi.input, unaware
> that reading from what it thinks the input stream is actually causing
> cascading reads down a series of file-like objects.
> 
> 2. The GzipFile object will decompress on the fly, meaning that it
> will only read from the wrapped input stream when it needs input.
> Which means that if the application does not read data from
> wsgi.input, then no data will be read from the client connection.

Hmmm, maybe I should have phrased my question a bit differently as to be
honest I am not actually interested in doing on the fly decompression and
only used it as an example. I really only want to know about how the
content length is supposed to be dealt with. I didn't want to explain the
actual context for the question as didn't want to let on yet to what I am up
to, so used an example which I thought would illustrate the problem.

> 3. The GzipFile should not be responsible for enforcement of the
> incoming Content-Length boundary. Instead, this should be enforced by
> the original server-provided file-like input stream that it wraps. So
> if the application attempts to read past Content-Length bytes, the
> server-provided input stream "is allowed to simulate an end-of-file
> condition". Which would cause the GzipFile to return an EOF to the
> application, or possibly an exception.
>
> 4. Because of the on-the-fly nature of the GzipFile decompression, it
> would not be possible to provide a meaningful Content-Length value to
> the application. To do so would require buffering and decompressing
> the entire input data stream. But the application should still be able
> to operate without knowing Content-Length.

I am not sure this fully answers what I want to know. If I leave the
content length header as is and any application does a
read(content_length) and decompression or some other input filter
actually results in more data than that being available, the application
will not get it all as it has only asked to read the original length
before decompression.

The PEP says that an application though should not attempt to read more
data than has been specified by the content length. If it is common
practice that applications take this literally and always get data from
the input by using read(content_length) then there is a requirement that
the content length header must exist. Thus, if the input filter does zap
the content length header and remove it then an application which does
that will not work.

Thus the question probably is, what is accepted practice or what does
the PEP dictate as to how applications should use read()?

Is in accepted as the norm that applications will always do
read(content_length) and thus zapping the content length is
unacceptable, or for where an application doesn't need to know the
content length up front, for example except where it needs to pass it
downstream like proxy in paste, would applications always just use
read() with no argument and just get all data, or at worst read it in
chunks until read() returns an empty string. BTW, yes I know that they
could use readline(), readlines() or __iter__(), but lets look at this just
in terms of read() for now.

So, is it okay to remove the content length header when there is actually
data and I know it wouldn't actually be correct, or does that result in a
situation that is seen as violating the PEP or even if acceptable would break
existing WSGI applications.

Or in short, is it mandatory that content length header must exist if there is
non zero length data in input? I know the PEP says that the content length
may be empty or absent, but am concerned that applications would assume
it has value of 0 if empty or absent.

> 5. The wrapping can NOT be done in middleware. PEP 333, Section "Other
> HTTP Features" has this to say: "WSGI applications must not generate
> any "hop-by-hop" headers [4], attempt to use HTTP features that would
> require them to generate such headers, or rely on the content of any
> incoming "hop-by-hop" headers in the environ dictionary. WSGI servers
> must handle any supported inbound "hop-by-hop" headers on their own,
> such as by decoding any inbound Transfer-Encoding, including chunked
> encoding if applicable." So the wrapping and replacement of wsgi.input
> should happen in the server or gateway, NOT in middleware.
> 
> 6. Exactly the same principles should apply to decoding incoming
> Transfer-Encoding: chunked.

My understanding is that content encoding is different to transfer encoding,
ie., is not hop by hop in this sense and that the same statements don't apply.
I could well be wrong though. But even if this is the case, the underlying
server itself may not be able to guarantee that the content length header
itself is valid if it is doing the decompression using its own filter. Thus, it
may itself way to zap the content length header before anything is even
handed off to a WSGI stack. Therefore at the very outset the root application
may get no content length header but there is still data to read. I know this
may cause issues if an application checks for a content length header and
if not found raises the HTTP error response indicating that length is required,
but ignoring that, if in general to use read() content length header must always
exist, then it effectively means that an underlying web server can never use
any input filters of its own which would change such things as the content
length of data.

If this is the case, then it seems that a WSGI adapter for a specific web server,
if it can detect that the web server is going to apply a filter of its own which is going
to change the content length, that it possibly should respond with some sort
of error before it even hands it off to the WSGI stack so as to avoid problems.

In other words, the adapter should flag a configuration issue with an error
to cause the server admin to ensure that all web server input filters are disabled for
URLs that are being passed through to a WSGI application. Ie., leave everything up
to WSGI and not try and do things itself. But then if one does leave everything
up to WSGI, then how to solve the issue of how it can implement decompression
itself and will zapping the content length cause failure of existing applications,
thus back to my original question.

Hope you can follow what I am going on about.

> P.S. Thanks for all your great work on mod_python Graham!

Wait till you see what I am about to come out with if I can sort this issue out. :-)

Graham

From pywebsig at alan.kennedy.name  Mon Jan 15 13:47:24 2007
From: pywebsig at alan.kennedy.name (Alan Kennedy)
Date: Mon, 15 Jan 2007 12:47:24 +0000
Subject: [Web-SIG] WSGI input filter that changes content length.
In-Reply-To: <1168861789.27614@dscpl.user.openhosting.com>
References: <1168861789.27614@dscpl.user.openhosting.com>
Message-ID: <4a951aa00701150447k170ce6efve0d0bc379b342bcd@mail.gmail.com>

[Graham]
> Hmmm, maybe I should have phrased my question a bit differently as to be
> honest I am not actually interested in doing on the fly decompression and
> only used it as an example. I really only want to know about how the
> content length is supposed to be dealt with. I didn't want to explain the
> actual context for the question as didn't want to let on yet to what I am up
> to, so used an example which I thought would illustrate the problem.

Point taken. But I think gzip encoding is a good example to illustrate
the issues.

[Graham]
> If I leave the
> content length header as is and any application does a
> read(content_length) and decompression or some other input filter
> actually results in more data than that being available, the application
> will not get it all as it has only asked to read the original length
> before decompression.

So obviously the Content-Length header cannot be left unmodified if
some transformation is in place that is altering the length of the
content.

There are two choices for how the wrapping should happen.

1. The ungzipping filter reads the entirety of the (possibly huge)
input, decompresses it, and makes it available in wsgi.input. The
Content-Length header is rewritten to reflect the length of the
decompressed content. The client has a valid Content-Length value, but
the server has had to buffer a potentially large input stream in order
to be able to provide that.

2. The ungzipping filter wraps the compressed stream, and decompresses
on demand and on-the-fly. In this case, it *must* delete the old
Content-Length header, which is now invalid. It cannot provide a new
value for Content-Length, since the final uncompressed length of the
input stream cannot be known.

[Graham]
> The PEP says that an application though should not attempt to read more
> data than has been specified by the content length. If it is common
> practice that applications take this literally and always get data from
> the input by using read(content_length) then there is a requirement that
> the content length header must exist. Thus, if the input filter does zap
> the content length header and remove it then an application which does
> that will not work.

Then I suppose that that application is not a fully-compliant WSGI application.

Scenario 2 outlined above is a perfectly valid scenario that can
happen, so an application that cannot deal with that scenario is not
robust.

> Thus the question probably is, what is accepted practice or what does
> the PEP dictate as to how applications should use read()?

AFAICT, the PEP is not prescriptive about the use of the
wsgi.input.read() method.

However, given that you have found it necessary to raise the question,
perhaps it should be added to the WSGI PEP that absence of a
Content-Length header does NOT imply absence of content.

[Graham]
> So, is it okay to remove the content length header when there is actually
> data and I know it wouldn't actually be correct,

I would say it's compulsory to remove the header: it contains an
incorrect value, and if the application uses that value, it will get
unexpected data or an exception, and rightly so.

[Graham]
> or does that result in a
> situation that is seen as violating the PEP or even if acceptable would break
> existing WSGI applications.

I would say that leaving an incorrect value in place should be a
violation of the PEP.

> Or in short, is it mandatory that content length header must exist if there is
> non zero length data in input? I know the PEP says that the content length
> may be empty or absent, but am concerned that applications would assume
> it has value of 0 if empty or absent.

No, the Content-Length header is optional, and any applications that
operate otherwise are non-compliant.

[Alan]
>> 6. Exactly the same principles should apply to decoding incoming
>> Transfer-Encoding: chunked.

[Graham]
> My understanding is that content encoding is different to transfer encoding,
> ie., is not hop by hop in this sense and that the same statements don't apply.

Hop-by-hop header means that the attribute described in the header is
not an inherent attribute of the content being transferred, but is
solely used in one stage of a multi-hop communication.

If my browser is using a proxy, which relays requests on to a server,
the proxy may decide to use Transfer-Encoding to communicate with the
server. Thus the Transfer-Encoding only applies to the proxy->server
"hop". If the server receives such a Transfer-Encoding, it *must*
decode the content according to that Transfer-Encoding before making
it available to the application.

[Graham]
> Wait till you see what I am about to come out with if I can sort this issue out. :-)

Now I'm intrigued :-)

Regards,

Alan.

From fumanchu at amor.org  Mon Jan 15 17:45:52 2007
From: fumanchu at amor.org (Robert Brewer)
Date: Mon, 15 Jan 2007 08:45:52 -0800
Subject: [Web-SIG] WSGI input filter that changes content length.
References: <1168861789.27614@dscpl.user.openhosting.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224D0F@ex9.hostedexchange.local>

Graham Dumpleton wrote:
> My understanding is that content encoding is different
> to transfer encoding, ie., is not hop by hop in this
> sense and that the same statements don't apply.

No, you're right. "Content-Encoding: gzip" is not hop-by-hop, and should therefore be allowed in middleware. But as you've noticed, that doesn't mean it's easy. ;)


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20070115/1c4cc0c1/attachment.htm 

From grahamd at dscpl.com.au  Mon Jan 15 23:38:50 2007
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Mon, 15 Jan 2007 17:38:50 -0500
Subject: [Web-SIG] WSGI input filter that changes content length.
Message-ID: <1168900730.28720@dscpl.user.openhosting.com>

Robert Brewer wrote ..
> Graham Dumpleton wrote:
> > My understanding is that content encoding is different
> > to transfer encoding, ie., is not hop by hop in this
> > sense and that the same statements don't apply.
> 
> No, you're right. "Content-Encoding: gzip" is not hop-by-hop, and should
> therefore be allowed in middleware. But as you've noticed, that doesn't
> mean it's easy. ;)

A good example of why it isn't hop by hop is that a client could use 'Content-
Encoding: gzip' and both the web server and the WSGI stack could simply not do
anything about it, but a WSGI proxy middleware could then pass the request as
is through to some other web server which would then decompress it. Because
it can be passed through in this way, it isn't really hop by hop.

Graham

From chad at zetaweb.com  Mon Jan 15 23:39:30 2007
From: chad at zetaweb.com (Chad Whitacre)
Date: Mon, 15 Jan 2007 17:39:30 -0500
Subject: [Web-SIG] [ANN] Aspen 0.7 -- WSGI + filesystem = sweet webserver
Message-ID: <45AC02A2.2050402@zetaweb.com>

Greetings, program!

I've just released Aspen 0.7. Aspen is a Python webserver, and 
this is the first version to be used in production. As such, I'm 
announcing it generally as well as to the Web-SIG.

This release is about making Aspen easy to configure, and making 
that configuration easy to get to from your WSGI modules. I'm 
pleased with the result, and would love to hear your feedback.

Also, allow me to thank the following for being Aspen's first 
contributors:

   * Giorgi Lekishvili, for initial optimization work
   * Maciek Starzyk, for keeping us honest on Windows
   * Vasilis Dimos, for providing benchmarks
   * alefnula, for feedback on supporting Range requests


Downloads, docs, and links are here:

   http://www.zetadev.com/software/aspen/


Thanks!


yours,
Chad Whitacre

----

http://www.zetadev.com/  <- FOSS
http://tech.whit537.org/ <- blog

From chad at zetaweb.com  Tue Jan 16 04:00:53 2007
From: chad at zetaweb.com (Chad Whitacre)
Date: Mon, 15 Jan 2007 22:00:53 -0500
Subject: [Web-SIG] [ANN] Aspen 0.5 -- module reloading & directory
	handlers
In-Reply-To: <20061215184320.GA21306@slack.it>
References: <4574ED76.4050203@zetaweb.com>
	<64ddb72c0612051636j1bdf7c2avaebb79622e37e131@mail.gmail.com>
	<45764B32.6080006@zetaweb.com> <4579D762.2080200@colorstudy.com>
	<f593a5ce0612081630g44bf87d9r3ae4c99c88866a53@mail.gmail.com>
	<457A092C.3010403@colorstudy.com> <457A21C2.2060506@zetaweb.com>
	<20061215184320.GA21306@slack.it>
Message-ID: <45AC3FE5.6010007@zetaweb.com>

Francesco,

Thanks for taking the bait. Sorry to wait so long to respond. :)

 >>    $ cd /usr/local/www/example.com
 >>    $ aspen
 >>    ...
 >>
 >> What happens next?
 >
 > let me try..
 >
 >  $ aspen
 >  Aspen - Python web server - version 0.5
 >  usage:
 >   aspen [-aAbCfhHkRstTzZ]
 >
 >  try "aspen --help" for more info

I was very interested when I saw this suggestion! It makes a lot 
of sense. However, Ian's guess was closer (although he may have 
cheated by looking at the docs :-P ).

Aspen's command-line interface is patterned after apachectl and 
zopectl. But Aspen is not monolithic like the former, nor 
framework-specific like the latter.

$ aspen --help # does get you help though ;-)


 >  hello everyone, i'm new on this list

Welcome. :)


chad

From sh at defuze.org  Wed Jan 17 16:07:24 2007
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Wed, 17 Jan 2007 15:07:24 +0000
Subject: [Web-SIG] ANN: amplee 0.4.0
Message-ID: <45AE3BAC.2020909@defuze.org>

Hi all,

I finally released a new version of amplee. I've moved from 0.3.x to
0.4.x as there are a couple of modification to the API that were worse
the bump. I think this version is much more stable and bug free. Mind
you it's a long way before I can claim is entirely unit tested. But it
gets there.

The main modifications since 0.3.6 are:

 * Added a loader feature. I realized that setting up the store was a
recurrent task and I wondered how to help in this task. I came up with
the loader feature. Basically you describe your APP store within a
config file (pure INI) and calls the loader method. This will construct
entirely your store and returns it to you. This is quite handy and makes
the creation of a store much easier.

 * Handler API introduced. Amplee does the best it can to provide you
with an API an tools to handle the dirty work of APP and let you enhance
it by a callback system. In previous version those callbacks were
attached to the collection which forced to some not very friendly hacks.
Now it's a matter of creating a class that implements a set of methods
which will then be called by amplee at the right moment. This class is
what I call a handler and associated with a media-type that the
collection accepts.

 * The loading and reloading of members is more flexible. In the past
you could solely reload all members of a store or none. Now you have
more granularity upon what should be loaded into a collection's cache.

 * Many notable bugs have been fixed in the handling of Atom within the
members and they should be much more reliable now.

 * You can now find a small blog example that shows you how to use amplee.

If you think of upgrading you should note that because of the
modification on the callback API you may have some work to do. But this
should not be too difficult.


== Download ==

 * easy_install -U amplee
 * Tarballs http://www.defuze.org/oss/amplee/
 * svn co https://svn.defuze.org/oss/amplee/

== Documentation ==

http://trac.defuze.org/wiki/amplee
http://www.defuze.org/oss/amplee/doc/html/

== Examples ==

You can get some source code examples at
http://defuze.org/oss/amplee/amplee-example-0.4.0.tgz

== TODO ==

 * Add more tests
 * Improve documentation
 * Test with IronPython
 * Enhance WSGI support

Have fun,
-- Sylvain Hellegouarch
http://www.defuze.org

Reply

From grahamd at dscpl.com.au  Sat Jan 27 03:17:47 2007
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Sat, 27 Jan 2007 13:17:47 +1100
Subject: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.
Message-ID: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>

In the PEP it says:

SCRIPT_NAME
   The initial portion of the request URL's "path" that corresponds
   to the application object, so that the application knows its virtual
   "location". This may be an empty string, if the application
   corresponds to the "root" of the server.

PATH_INFO
   The remainder of the request URL's "path", designating the virtual
   "location" of the request's target within the application. This may
   be an empty string, if the request URL targets the application root
   and does not have a trailing slash.

Seeking further clarification on what happens in certain circumstances,
paste.lint says:

   - That SCRIPT_NAME and PATH_INFO are empty or start with /

   - That at least one of SCRIPT_NAME or PATH_INFO are set.

   - That SCRIPT_NAME is not '/' (it should be '', and PATH_INFO should
     be '/').

As illustration of what this appears to all mean:

   Mount Point: /application

   Request URL: /application/something

yields:

   SCRIPT_NAME: /application
   PATH_INFO: /something

and:

   Request URL: /application

yields:

   SCRIPT_NAME: /application

with PATH_INFO not needing to actually be defined as it will be empty.

Further:

   Request URL: /application/

yields:

   SCRIPT_NAME: /application
   PATH_INFO: /

For application mounted at the root of the web server:

   Mount Point:

   Request URI: /something

yields:

   SCRIPT_NAME:
   PATH_INFO: /something

Where SCRIPT_NAME doesn't really need to be defined given that it is  
empty.

All okay so far.

Now my questions revolve around where an application is mounted
at a URL which itself has a trailing slash. For example, in Apache  
one can
say:

   <Location /application/>
   ...
   </Location>

If a request arrives which is for '/application', it will not  
actually be directed
to the application because it doesn't have the required trailing  
slash and
so will not match the path in the directive.

In effect the mount point of the application is '/application/'. One  
cannot treat
the mount point as being '/application' as if that is then used by  
user code
to reference back to the root of the application for a link or  
redirect it will not
actually work as the trailing slash is missing.

Thus, this would suggest that for this case that one would have:

   SCRIPT_NAME: /application/

This though doesn't seem to marry up with WSGI very well. This is  
because
reconstruction of URLs indicates that all that is required is to join  
SCRIPT_NAME
and PATH_INFO back together. Ie.,

   from urllib import quote
   url = environ['wsgi.url_scheme']+'://'

   if environ.get('HTTP_HOST'):
       url += environ['HTTP_HOST']
   else:
       url += environ['SERVER_NAME']

       if environ['wsgi.url_scheme'] == 'https':
           if environ['SERVER_PORT'] != '443':
              url += ':' + environ['SERVER_PORT']
       else:
           if environ['SERVER_PORT'] != '80':
              url += ':' + environ['SERVER_PORT']

   url += quote(environ.get('SCRIPT_NAME',''))
   url += quote(environ.get('PATH_INFO',''))
   if environ.get('QUERY_STRING'):
       url += '?' + environ['QUERY_STRING']

If that is seen as being the case, then we would need to have:

   Mount Point: /application/

   Request URL: /application/something

yields:

   SCRIPT_NAME: /application/
   PATH_INFO: something

This though violates what paste.lint says is correct:

   - That SCRIPT_NAME and PATH_INFO are empty or start with /

Ie., can't have PATH_INFO not start without a slash. But to put one  
in and still
have SCRIPT_NAME valid as far as what the mount point is, you would  
need:

   SCRIPT_NAME: /application/
   PATH_INFO: /something

In URL reconstruction though this would yield:

   /application//something

Ie., introduce a double slash.

It also sort of violates the definition of PATH_INFO in the PEP as well.

It therefore seems that the idea of the mount point for an  
application having a
trailing slash may be incompatible with WSGI. Can this be considered  
to be the
case or is there some other way one is meant to deal with this?

Should a WSGI adapter for a web server which allows a mount point to  
have a
trailing slash specifically flag as a configuration error an attempt  
to use such a
mount point given that it appears to be incompatible with WSGI?

Thanks in advance for any feedback.

Graham

From pywebsig at alan.kennedy.name  Sun Jan 28 20:07:56 2007
From: pywebsig at alan.kennedy.name (Alan Kennedy)
Date: Sun, 28 Jan 2007 19:07:56 +0000
Subject: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.
In-Reply-To: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
References: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
Message-ID: <4a951aa00701281107i5178cc10gc9a3a89278935a65@mail.gmail.com>

[Graham Dumpleton]
> Should a WSGI adapter for a web server which allows a mount point to
> have a trailing slash specifically flag as a configuration error an
> attempt to use such a mount point given that it appears to be
> incompatible with WSGI?

OK, I'll have a go.

I think the question boils down to the following:

Assume an application mount point of "/application".

If a request is received for

/application

Then it will (and should) be redirected to the URL

/application/

Is that new URL to be interpreted as

SCRIPT_NAME: /application
PATH_INFO:   /

or interpreted as

SCRIPT_NAME: /application/
PATH_INFO:

I think that the WSGI interpretation is the first interpretation, and
the correct one, because it gives correct results when deriving
relative URLs for resources contained within the application.

Is that addressing the question?

[Graham Dumpleton]
> It therefore seems that the idea of the mount point for an
> application having a trailing slash may be incompatible
> with WSGI. Can this be considered to be the case or is there
> some other way one is meant to deal with this?

I don't know about "incompatible", although it obviously creates the
double-slash problem with computed URLs.

Perhaps the Apache "policy" on this issue is influenced by its origins
as a http server for serving hierarchies of directories and files from
a filesystem?

When it comes to CGI though, Apache does the right thing and passes

SCRIPT_NAME: /application
PATH_INFO:   /

to CGI scripts.

I don't know if this provides any insight into whether or not mounting
applications with a trailing slash is an error.

Does that help at all?

Alan.

From ianb at colorstudy.com  Sun Jan 28 20:46:21 2007
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 28 Jan 2007 13:46:21 -0600
Subject: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.
In-Reply-To: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
References: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
Message-ID: <45BCFD8D.7050103@colorstudy.com>

Graham Dumpleton wrote:
> In the PEP it says:
> 
> SCRIPT_NAME
>    The initial portion of the request URL's "path" that corresponds
>    to the application object, so that the application knows its virtual
>    "location". This may be an empty string, if the application
>    corresponds to the "root" of the server.
> 
> PATH_INFO
>    The remainder of the request URL's "path", designating the virtual
>    "location" of the request's target within the application. This may
>    be an empty string, if the request URL targets the application root
>    and does not have a trailing slash.
> 
> Seeking further clarification on what happens in certain circumstances,
> paste.lint says:
> 
>    - That SCRIPT_NAME and PATH_INFO are empty or start with /
> 
>    - That at least one of SCRIPT_NAME or PATH_INFO are set.
> 
>    - That SCRIPT_NAME is not '/' (it should be '', and PATH_INFO should
>      be '/').
> 
> As illustration of what this appears to all mean:
> 
>    Mount Point: /application
> 
>    Request URL: /application/something
> 
> yields:
> 
>    SCRIPT_NAME: /application
>    PATH_INFO: /something
> 
> and:
> 
>    Request URL: /application
> 
> yields:
> 
>    SCRIPT_NAME: /application
> 
> with PATH_INFO not needing to actually be defined as it will be empty.

Note that PEP 333 conflicts with the CGI specification here -- the CGI 
specification says that PATH_INFO (and SCRIPT_NAME) must always be 
present even when empty.  Since PEP 333 references the CGI spec, it's a 
bit inconsistent here.

It would be nice if PEP 333 said that PATH_INFO and SCRIPT_NAME SHOULD 
be set, and if wsgiref.validate produced a warning (but not exception) 
when it is missing.

> Now my questions revolve around where an application is mounted
> at a URL which itself has a trailing slash. For example, in Apache  
> one can
> say:
> 
>    <Location /application/>
>    ...
>    </Location>
> 
> If a request arrives which is for '/application', it will not  
> actually be directed
> to the application because it doesn't have the required trailing  
> slash and
> so will not match the path in the directive.

Apache does weird things with Alias too, which as an Apache user drive 
me nuts.  E.g., if you do:

   Alias /foo /path

Then /foobar goes to /pathbar.  Nuts.  But if you do:

   Alias /foo/ /path/

Then /foo doesn't work.  I think we should just avoid this stupid 
behavior and act intelligently with respect to trailing slashes.

> In effect the mount point of the application is '/application/'. One  
> cannot treat
> the mount point as being '/application' as if that is then used by  
> user code
> to reference back to the root of the application for a link or  
> redirect it will not
> actually work as the trailing slash is missing.
> 
> Thus, this would suggest that for this case that one would have:
> 
>    SCRIPT_NAME: /application/
> 
> This though doesn't seem to marry up with WSGI very well. This is  
> because
> reconstruction of URLs indicates that all that is required is to join  
> SCRIPT_NAME
> and PATH_INFO back together. Ie.,
> 
>    from urllib import quote
>    url = environ['wsgi.url_scheme']+'://'
> 
>    if environ.get('HTTP_HOST'):
>        url += environ['HTTP_HOST']
>    else:
>        url += environ['SERVER_NAME']
> 
>        if environ['wsgi.url_scheme'] == 'https':
>            if environ['SERVER_PORT'] != '443':
>               url += ':' + environ['SERVER_PORT']
>        else:
>            if environ['SERVER_PORT'] != '80':
>               url += ':' + environ['SERVER_PORT']
> 
>    url += quote(environ.get('SCRIPT_NAME',''))
>    url += quote(environ.get('PATH_INFO',''))
>    if environ.get('QUERY_STRING'):
>        url += '?' + environ['QUERY_STRING']
> 
> If that is seen as being the case, then we would need to have:
> 
>    Mount Point: /application/
> 
>    Request URL: /application/something
> 
> yields:
> 
>    SCRIPT_NAME: /application/
>    PATH_INFO: something

IMHO it's up to the dispatcher to make sure this sort of thing just 
doesn't happen.  In paste.urlmap I allow a trailing slash to be 
specified in a mount point, but ignore it, preferring instead to enforce 
internal consistency.  This seems to avoid the question you are bringing 
up here. In paste.urlmap when I get a request for '/application' I do 
the redirect in the dispatcher to '/application/', and I don't allow a 
mount point of '/application' to match '/applicationplussome'.

The nice part of this is that when you've coded it in the dispatcher It 
Just Works for people using that dispatcher, and they don't have to 
think about any of these WSGI details ;)


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org

From pje at telecommunity.com  Sun Jan 28 21:15:06 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 28 Jan 2007 15:15:06 -0500
Subject: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.
In-Reply-To: <4a951aa00701281107i5178cc10gc9a3a89278935a65@mail.gmail.co
 m>
References: <04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
	<04D48E95-E467-441E-98DB-CD7FCE6F1996@dscpl.com.au>
Message-ID: <5.1.1.6.0.20070128150944.0453de78@sparrow.telecommunity.com>

At 07:07 PM 1/28/2007 +0000, Alan Kennedy wrote:
>[Graham Dumpleton]
> > Should a WSGI adapter for a web server which allows a mount point to
> > have a trailing slash specifically flag as a configuration error an
> > attempt to use such a mount point given that it appears to be
> > incompatible with WSGI?
>
>[snip]
>I don't know if this provides any insight into whether or not mounting
>applications with a trailing slash is an error.
>
>Does that help at all?

I think it's safe to say that WSGI does not permit an application to live 
at a mount point with a trailing '/', unless it is the root of the host.

Whether this is a good thing or not is a separate question.  In truth, it 
had never occurred to me that such a thing was possible or practical.  If 
you look at the wsgiref.util.shift_path_info(), you'll see that it supports 
the possibility of having a trailing slash on a URL, and treating it 
differently, but the assumption is that all WSGI applications live at 
either the root or a location without a trailing /.

Given the weird effects that result from trying to manage relative names 
and other such complications of the idea, I don't think we should extend 
WSGI to allow applications to live at non-root URLs with trailing 
slashes.  They should live at the named location, and optionally get a 
PATH_INFO.  It's up to the application to interpret the trailing /, if any.


From grahamd at dscpl.com.au  Mon Jan 29 01:10:03 2007
From: grahamd at dscpl.com.au (Graham Dumpleton)
Date: Sun, 28 Jan 2007 19:10:03 -0500
Subject: [Web-SIG] Repeating slashes in REQUEST_URI,
	SCRIPT_NAME and PATH_INFO.
Message-ID: <1170029403.6800@dscpl.user.openhosting.com>

Another question on SCRIPT_NAME, PATH_INFO etc.

This time I am after information on what responsibilities an adapter for a
specific web server has in respect of removal and/or preservation of repeating
slashes in a request URI.

Take for example that a WSGI application is mounted at:

  /wsgi/a

and that the request URI is:

  REQUEST_URI: '/////wsgi//////a///b//c/d'

What should SCRIPT_NAME and PATH_INFO be set to? Should repeating slashes
be removed from SCRIPT_NAME so that it matches the normalised mount point,
or should the repeating slashes be preserved?

Thus should the above REQUEST_URI yield:

  SCRIPT_NAME: '/wsgi/a'
  PATH_INFO: '///b//c/d'

or perhaps:

  SCRIPT_NAME: '/////wsgi//////a'
  PATH_INFO: '///b//c/d'

Similarly should repeating slashes be left as is in the PATH_INFO?

I note that path_info_pop() in paste says:

        >>> def call_it(script_name, path_info):
        ...     env = {'SCRIPT_NAME': script_name, 'PATH_INFO': path_info}
        ...     result = path_info_pop(env)
        ...     print 'SCRIPT_NAME=%r; PATH_INFO=%r; returns=%r' % (
        ...         env['SCRIPT_NAME'], env['PATH_INFO'], result)
        >>> call_it('/foo', '/bar')
        SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns='bar'
        >>> call_it('/foo/bar', '')
        SCRIPT_NAME='/foo/bar'; PATH_INFO=''; returns=None
        >>> call_it('/foo/bar', '/')
        SCRIPT_NAME='/foo/bar/'; PATH_INFO=''; returns=''
        >>> call_it('', '/1/2/3')
        SCRIPT_NAME='/1'; PATH_INFO='/2/3'; returns='1'
        >>> call_it('', '//1/2')
        SCRIPT_NAME='//1'; PATH_INFO='/2'; returns='1'

The last comment demonstrates the need to treat repeating slashes
as a single slash, but also seems to indicate that SCRIPT_NAME can have
repeating slashes in it. Running the code yields:

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '//c/d', 'SCRIPT_NAME': '/////wsgi//////a///b'}

In wsgiref.shift_path_info(), although it also treats repeating slashes as one,
it strips all the repeating slashes out.

BEFORE: {'PATH_INFO': '///b//c/d', 'SCRIPT_NAME': '/////wsgi//////a'}
RESULT: 'b'
AFTER: {'PATH_INFO': '/c/d', 'SCRIPT_NAME': '/wsgi/a/b'}

What is accepted convention for dealing with repeating slashes. Should
any web server adapter leave repeating slashes in both SCRIPT_NAME and
PATH_INFO, or should it at least normalise SCRIPT_NAME so that it matches
the designated mount point.

Thanks in advance.

Graham

From fumanchu at amor.org  Mon Jan 29 05:44:46 2007
From: fumanchu at amor.org (Robert Brewer)
Date: Sun, 28 Jan 2007 20:44:46 -0800
Subject: [Web-SIG] Repeating slashes in REQUEST_URI,
	SCRIPT_NAME and PATH_INFO.
References: <1170029403.6800@dscpl.user.openhosting.com>
Message-ID: <435DF58A933BA74397B42CDEB8145A86224D22@ex9.hostedexchange.local>

Graham Dumpleton wrote:
> What is accepted convention for dealing with repeating slashes.
> Should any web server adapter leave repeating slashes in both
> SCRIPT_NAME and PATH_INFO, or should it at least normalise
> SCRIPT_NAME so that it matches the designated mount point.

The URI BNF allows for empty path segments, so a doubled slash has its own distinct meaning. And since "the designated mount point" is so designated by the URI, I would think one should leave doubled slashes in. IMO this certainly applies to PATH_INFO, although I could understand someone writing a server that normalized SCRIPT_NAME (and telling its users it was limited in that way).


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/web-sig/attachments/20070128/ec6fbf7a/attachment.html