[Web-SIG] Proposal: Handling POST forms in WSGI

Sun Oct 22 20:05:56 CEST 2006

Phillip J. Eby wrote:
> At 02:04 PM 10/21/2006, Ian Bicking wrote:
>> I've added another spec to wsgi.org:
>> http://wsgi.org/wsgi/Specifications/handling_post_forms
>>
>> This one is a little more intrusive than wsgi.url_vars, but it addresses
>> an outstanding source of problems: contention over wsgi.input.
> 
> -1 on this being middleware.  If middleware wants to read the input, it 
> should copy it to a temporary file or StringIO, not remove it.

This isn't middleware, it's a suggestion of a library routine for 
reading POST form submissions.  If multiple consumers use this same 
routine (or generally, the algorithm described) then they won't conflict.

Copying to a StringIO or tempfile is possible, though it introduces a 
couple layers of indirection where it is likely none is needed. 
Potentially wsgi.input could be replaced with something that lazily 
serializes the parsed form back into an unparsed form; perhaps coupled 
with a monkeypatch on cgi that detects this case and also provides a 
shortcut.

> The broader principle here is that WSGI extensions should *add* to the 
> WSGI specification, not subtract from it.  Code running under middleware 
> that does as you have proposed will be unable to use its own form 
> processing or support nested applications.  It's therefore not 
> composable or further extensible, and I therefore have a hard time 
> viewing the proposed middleware as being WSGI compliant.

The status quo is that middleware or framework code that accesses POST 
vars are incompatible with any other middleware, framework code, or 
applications that also want to access POST vars.

This does not subtract from WSGI, it enables a pattern that is currently 
problematic.  It really is problematic, in that I've encountered this 
problem (contention over wsgi.input), and sometimes when I would like to 
access the POST vars in middleware I am currently unable to because it 
causes too many problems with code that comes later in the stack, or I 
am unable to because wsgi.input has already been consumed.

> This is an extremely good example of something that belongs in a 
> *library* and should not be done in middleware.  Only end-application 
> code that knows no further dispatching will occur is in a position to do 
> destructive reading from wsgi.input.  Middleware should be 
> non-destructive, and should NOT be used where a library will suffice, 
> since they add setup complexity and runtime performance overhead.

End application code knows no further dispatching will occur, but 
framework code does not know this.  Typically it is a framework that 
parses the POST vars, not an application.

> The simple, standard way to do something like this would be to have a 
> library routine like 'get_form_vars(environ)'.  The routine would check 
> for the form vars key, and if not present, then it would process the 
> input and cache the information in the environment.  It could even have 
> an option to clone the input, in case the routine is being used from 
> middleware.

This is what paste.request.parse_formvars does -- I'm suggesting this 
standard so that all consumers, not just people using Paste, can be 
compatible with each other.

> In general, where adding functionality doesn't require that the request 
> or response be modified (as opposed to information simply being added to 
> the environ), it should be done using library routines like this.  There 
> is no middleware setup or call-through overhead, and the calculation of 
> additional environ entries only takes place if the information is 
> actually used.  There is also no need to use string constants as environ 
> keys except in the routines themselves.  This approach should be 
> considered a best practice for *any* additions to the environ.

Reading from wsgi.input effectively does modify the request.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org