[Web-SIG] buffer used by socket, should also work with python stdlib Re: Request for Comments on upcoming WSGI Changes

Graham Dumpleton graham.dumpleton at gmail.com
Tue Sep 22 00:45:31 CEST 2009


2009/9/21 René Dudfield <renesd at gmail.com>:
> On Mon, Sep 21, 2009 at 1:27 PM, Armin Ronacher
> <armin.ronacher at active-4.com> wrote:
>> Hi,
>>
>> René Dudfield wrote:
>>> That's all the arguing and explaining I'll do on this - I'm not going
>>> to rewrite cherrypy for you as proof.
>> If it just puts a burden on implementors on the client and server side
>> and there is no proof for it to be faster for real world applications we
>> can probably just ignore that then.
>>
>>
>> Regards,
>> Armin
>>
>
> hi,
>
> yes I think ignoring it for now is a good idea.
>
>
> However, it could be a good addition to a future spec.
>
> Currently wsgi forces anything built on top to be able to not use them.
>
> It's zero extra work for implementors who don't want to specify a
> buffer.  Implementors and clients can just not pass in or use a
> destination buffer.
>
> # non caring use:
> buf = recv(socket, nbytes)
>
> # buffer caring use:
> buffer = pool.get_buffer()
> buf = recv(socket, nbytes, buffer)
>
> So I don't see it as a burden to use for people who don't care about it.
>
>
> To explain the mmap use case more clearly... you could pass in a
> memory mapped buffer to allow the process to write to disk directly...
> or as shared memory so other processes can mmap the data and process
> it.  Rather than sending your data over a pipe(as in fastcgi), you can
> just access it directly.
>
> As another piece of evidence that it is faster to use buffers, rather
> than allocate all the time, nginx uses memory pools.  So does
> apache... and lighttpd...

WSGI is specifically intended as Python specific API definition only.
It isn't and will never be expanded to also encompass a wire protocol,
or provide direct support for a foreign wire protocol, for
communication across a socket connection or to enable optimisations
across such a connection specific to some existing wire protocol.

The whole point of WSGI is that it is the lowest common denominator
and really really simple.

That said, wsgi.file_wrapper already provides a rather large hole for
at least some optimisations in returning of response data back via the
client connection, albeit that not many WSGI server implementations
provide such optimisations.

The only constraint on wsgi.file_wrapper is that the the object
supplied to it be file like to the extent of providing a read()
method. This though is a fallback purely for case where the specific
WSGI server cannot implement optimisations based on the actual type of
the file like object supplied to it and wsgi.file_wrapper instance
will act just like a normal iterable and so has to be able to read
data in chunks from file like object.

In Apache/mod_wsgi, if the argument to wsgi.file_wrapper is a file
like object which provides a fileno() and tell() method, then on UNIX
systems it will already optimise the return of the file contents by
using sendfile() or memory mapping techniques.

People have even used a small wrapper class around an instance of
Python mmap object to allow fileno() and tell() to be visible together
to satisfy that requirement and so have been able to implement
optimised return of mmap'd data via Apache/mod_wsgi.

In other words, Apache/mod_wsgi already provides mechanisms which
avoid any in process memory copies when returning open files and/or
memory mapped files.

A WSGI server could already if it wanted provide a feature whereby it
allowed a wsgi.file_wrapper to accept a special object which wrapped
your 'buffer' data and which treated that specially and used the
mechanisms you describe to send that buffer using optimised means
directly out onto a socket connection with no additional copies
involved. The only requirement is that the special object supply a
read()/close() methods as appropriate so that it will work for WSGI
servers that don't implement your optimisation.

No changes are required to the WSGI specification for this part to be done now.

Thus, all you need to do is convince the author of an existing pure
Python WSGI server to provide the feature, or take one of the WSGI
servers yourself and implement it.

Graham


More information about the Web-SIG mailing list