[Web-SIG] Implementing File Upload Size Limits

Sat Nov 22 10:12:26 CET 2008

2008/11/22 Randy Syring <randy at rcs-comp.com>:
> I am looking for opinions and thoughts on best practice for limiting file
> upload size.  I have a few considerations:
>
> Ultimately, I would want my application with my method of handling forms to
> be able to give the user a message that the file size was too big.  That
> means that however, the size is limited, just blanking out wsgi.input and
> setting content-length to zero doesn't seem correct.  That would make it
> look like the form wasn't submitted with any data I believe.
> Given the above, it seems that something would need to get put in the
> environment to tell middleware and the application that the file input was
> aborted, but what would be the best way for doing it?  Should it be some
> kind of standard, or just dependent on your server or middleware?
> It seems best to implement this functionality as the very first middleware
> in the stack.  Since other middleware read and manipulate wsgi.input,
> handling the upload size at the application level wouldn't prevent middlware
> from wasting resources dealing with a very large file.
>
> Is it possible to prevent the server from even accepting all the data (i.e.
> trying to save bandwidth and server resources) if the content-length is
> known to be too big?  Or is the server required to take all the client's
> data regardless, even if it ends up going in the bit bucket?  I realize some
> of this is server specific, not WSGI specific, but I would be interested in
> knowing how the most popular servers handle this or what the HTTP specs
> require if anyone knows.
>
> Thanks in advance for any insight you might be able to provide.

If you use Apache/mod_wsgi to host your WSGI application, the best way
of handling this is use the Apache LimitRequestNody directive for
appropriate context. This will result in Apache returning a
HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
you need a custom error document for that response type use Apache
ErrorDocument directive to specify URL of handler which would generate
it.

Except for the custom error document if delegated to the WSGI
application, doing it this way results in it all being handled by
Apache/mod_wsgi and your WSGI application will not even be invoked.
The request body content would also not even be read by Apache at all.
Do note that whether this avoids the client sending the request body
input depends on whether the client was expecting a '100 Continue'
response before it send the data. Most web browsers still I believe
don't use '100 Continue' response.

This would be the preferred solution for Apache/mod_wsgi as it is
handled at lowest levels and guaranteed that request content wouldn't
be read at that point. It is however taking control out of your
application.

For Apache/mod_wsgi, if you do not do it this way but instead validate
content length in the WSGI application and have the WSGI application
return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
whether the request content gets read depends on whether you are using
embedded mode or daemon mode of mod_wsgi.

If you use embedded mode, so long as your WSGI application doesn't
read the input and just returns the error response, the request
content wouldn't be read at all. If you are using daemon mode however,
then the request content would always be read by Apache child worker
process, even if client asked for '100 Continue' response. This is
because the Apache child worker process will always proxy request
content to the daemon process.

Anyway, that is how things are for Apache/mod_wsgi.

Graham