multitask http server (single-process multi-connection HTTP server)

Luke Kenneth Casson Leighton luke.leighton at gmail.com
Mon Jul 12 19:28:29 EDT 2010


On Mon, Jul 12, 2010 at 10:13 PM, geremy condra <debatem1 at gmail.com> wrote:
> On Mon, Jul 12, 2010 at 4:59 PM, lkcl <luke.leighton at gmail.com> wrote:
>> for several reasons, i'm doing a cooperative multi-tasking HTTP
>> server:
>>  git clone git://pyjs.org/git/multitaskhttpd.git
>>
>> there probably exist perfectly good web frameworks that are capable of
>> doing this sort of thing: i feel certain that twisted is one of them.
>> however, the original author of rtmplite decided to rip twisted out
>> and to use multitask.py and i'm one of those strange people that also
>> likes the idea of using 900 lines of awesome elegant code rather than
>> tens of thousands of constantly-moving-target.

> I may not be fully understanding what you're doing, but is there a
> reason that one of the mixins can't be used?

 yes: they're threaded.  or forking.  this is *single-process*
cooperative multitasking.   but that's not quite an answer, i know.

 perhaps i should explain the three use-cases:

1) pyjamas-web.

 pyjamas-web was an experiment i did, last year, to port pyjamas
http://pyjs.org to run the *exact* same app which normally is either
compiled to javascript (to run in the web browser), or is run as a
desktop app.... that *exact* same app i wanted to run it
*server-side*.

 the reason for doing this is so that javascript would not be needed.

 remember that the apps _really_ look like GTK apps:
     RootPanel().add(HTML("Hello World")

 so there's a python application, and i re-implemented all of the
widgets to add to an XML document (actually an eltree instance) and i
tidied things up with BeautifulSoup, voila, splat, spew forth some
HTML, output it from the web framework.  i'd picked mod_python.

 did it work?  why yes it did!  right up until i had pressed refresh a
few times, or pressed a few application "buttons".

 at that point, it all fell over, because, why?  ****ing threads,
that's why.  HTTP is totally stateless.  each mod_python thread is
arbitrarily picked to run an incoming request.  i had totally
forgotten this, and had "associated" the pyjamas application instance
with the python thread's local storage memory.   oh... shit :)

 so, one person could be viewing an app, then next minute, someone
else connects to the server, and they get the other person's app!!

 the only sane way round this, especially because if you want to
serialise the data structures of the pyjamas app you might have to
dump... what... 100k into a database or so (!!) and retrieve it, is to
have a SINGLE PROCESS multitasking web server, where all the apps are
stored in the SAME PROCESS.

 then, you can store the apps in a dictionary, and look them up by a
session cookie identifier.

 problem is solved.

2) rtmplite itself

rtmplite currently does RTMP only (port 1935).  for the exact same
reasons as 1) the real-time audio/video streams need to be passed
across from client to client, and it needs to be done QUICKLY.
passing A/V data across unix domain sockets or shared memory in a
_portable_ fashion... does anyone _know_ how to do data sharing
portably using python?  i don't! i'd love to know! :)

so to avoid the problem, easy: just design the app to be
single-process, handling multiple RTMP clients simultaneously.

now i would like to see rtmplite extended to support HTTP (RTMP can be
proxied over HTTP and HTTPS).

3) persistent database connections with per-user credentials

this is the _real_ reason why i'm doing this.  the GNUmed team are
looking to create a web version of GNUmed.  however, their current
system relies on postgresql "roles" for security.  as a
python-wxWidgets app, that's of course absolutely fine.

but, for a _web_ app, it's not in the slightest bit fine, because
_all_ web frameworks assume "global" database credentials.

also, they're counting on the database connections being persistent.

how in hell's name, when using a stateless protocol like HTTP, do you
cross-associate *persistent* and multiple per-user database
connections with a user's browser, when all the web frameworks out
there only really do "global" single-username, single-password
database logins??

and the answer is: by using a cooperative multitasking httpd, which
adds a session cookie to each incoming connection, and uses that to
create an in-memory "application instance" which will STAY in memory
(of the single-process http server).

in that in-memory application instance, you can do whatever you like:
have a login form which takes user/pass as input, then uses that on
the POST to create a database connection, and stores the resultant
authenticated psycopg2 "pool" connection into the app instance.

the app instance is always always looked up in the dictionary of app
instances, using the cookie session id as a key, and thus, each
browser will _always_ be directed at the correct app instance.


i hate to think how this would be done using any of the standard
MixIns.  even if you wrote a special MixIn which did single-instance
socket handling, you couldn't use it because the BaseHTTPHandler
doesn't "cooperate", it has a while True loop on serving connections
until they're closed.  so, you could serve one HTTP 0.9 request and
then another person, orrr you could serve one HTTP 1.0 or 1.1 request
that WAS NOT SET TO "Connection: keep-alive" and then do another
person...

... but if ever there was a keep-alive set, you would be FORCED to
serve that one HTTP connection, blocking absolutely everyone else
until they buggered off.

 ... but with multitaskhttpd, even the persistent HTTP connections are
still cooperatively multi-tasking.  so, yah, whilst you're reading a
big file, or serving a big database query, you're hosed (and i think i
have a way to deal with the big database query thing, but it's an
optimisation so am leaving it for now [*]), but other than that,
you're fine.

so - yah.  definitely something that i've not been able to find (and
don't like twisted, even if it could do this).

l.

[*] the solution involves creating some threads, accessible using a
global lock, which run the database queries.  the threads could even
be done using a Threaded TCPSocketServer, and in each thread, the
handler instance has a persistent database connection.  the sockets
are there merely to notify the main process (the multitask app) of the
fact that a particular database query operation has completed, by
sending the unique ID of the thread down the socket.  the main process
(multitask app) can then hook a global lock, get at the list of
threads, grab the one with the right ID, grab the SQL query results
and then unhook the global lock.

the reason for using sockets is because multitask can do cooperative
multitasking on socket filehandles.  by having a special task which
yields on the Threaded TCPSocketServer connection, the main process
can delay answering a particular HTTP request until the database
response has been completed.



More information about the Python-list mailing list