[Web-SIG] Web Container Interface

Wed Jan 28 17:39:22 EST 2004

Phillip J. Eby wrote:
> At 02:11 PM 1/28/04 -0600, Ian Bicking wrote:
> 
>> Portable Real Web Applications could also adapt their behavior 
>> depending on how they were being run, for instance creating an 
>> abstraction that cached data in module globals if available (and 
>> unlike sessions, that can work in multi-process models), or wrote them 
>> to disk otherwise.
> 
> 
> And those applications don't provide a way to configure that?
> 
> Honestly, my experience supporting applications that "adapt their 
> behavior" without the user's input is rather unpleasant.  Honestly, I'm 
> -1 on providing ways for developers to make their applications decide 
> stuff based on what they *think* is going on, in the absence of narrowly 
> and precisely defined options.
> 
> 
>> In a long-running model, the framework also needs to know when the 
>> environment is shutting down (so it can write data out to disk).  
>> Maybe atexit would be sufficient, I'm not sure (it's not what we use 
>> now).
> 
> 
> And what do you do now when somebody does a "kill -9" on the process, or 
> the machine reboots?  Python doesn't even guarantee that all objects in 
> a process will be finalized during a *normal* exit, so how can any 
> Python container guarantee a finalization notice?  I'd rather we didn't 
> promise what isn't deliverable, or anything that starts blurring 
> responsibilities between container and service.

We can't guarantee anything, but at least we can give the application a 
fighting chance.

>> But why not provide that one little hook (a link to the 
>> gateway/container) that would allow systems to develop greater 
>> portability?
> 
> 
> If you define "portability" as "ability to run under new and 
> never-before-seen containers", that hook woudl not only provide zero 
> portability improvement, but it would also be an "attractive nuisance" 
> encouraging people to write *non* portable code, that specifically looks 
> for given containers.  Configuration and options should be explicit, not 
> implicit.  "In the presences of ambiguity, refuse the temptation to guess".

Truisms, I say!  Anyway, it's not about guessing.  It's about 
hard-coding behavior based on the environment, when it's called for to 
solve demonstrable problems.  You don't get OS-independent programs by 
hiding the operating system from the language (though people have 
tried).  And I don't think you get gateway-independent applications by 
hiding the gateway.

> If the app or framework can choose its behaviors, let it make those 
> options explicit, as part of its configuration.

Configuration sucks!  If the application is not behaving properly in its 
environment, it's a bug.  This is open source (at least, every 
implementation I care about will be), if you can't get the upstream to 
fix the bug, you can always fix it yourself.

That may not be true with horribly bloated or close-source software, but 
I don't think we should use bad experiences with that sort of software 
to color our vision here.

>>>> And, to make it a little harder, we've often had requests to 
>>>> implement memory-only sessions, to put unpickleable objects into the 
>>>> session. Usually we just tell people to keep these values in module 
>>>> globals.
>>>> But module globals are also unportable across environments.
>>>
>>> But if your framework only supports a "long running, single-process" 
>>> architecture, module globals would work just fine with any gateway 
>>> that supports that.
>>> Frankly, "multi-process only" and "short running" gateways are going 
>>> to be in the minority anyway.  The only gateway I know of that's 
>>> likely to *require* multiple processes is mod_python, and the only 
>>> gateway that's likely to be "short running" is plain CGI.  So, it's 
>>> not like requiring an "LR/SP" gateway is going to dramatically limit 
>>> the choice of gateways for Webware.
>>
>> SkunkWeb also uses multiple processes, run in a separate space from 
>> Apache.
> 
> 
> How does it communicate with Apache?  Does it *require* multiple 
> processes, or *allow* them?

It's like Apache, preforking worker processes.  It communicates with 
mod_skunkweb, which is equivalent to FastCGI, PCGI, SCGI, mod_webware, etc.

It requires a single request per process (or a single process per 
request) as it uses globals in several places, including print.  I don't 
believe it offers any significant configuration of its behavior (number 
of processes and such, but there's no threaded option or anything like 
that).

>>> Obviously, the PEP needs to have examples of these process models 
>>> added, and clarify the nature of the restrictions.  Who knows, maybe 
>>> if we talk about this long enough maybe we'll be able to clarify the 
>>> process models well enough to define a variable that services can 
>>> expose to indicate their compatibility with various process models.
>>
>>
>> I think that would be very useful.  Here's my list:
>>
>> Single process per request:
>> * process ends with request (CGI, Webware OneShot)
>> * process reused (mod_python, SkunkWeb)
> 
> 
> I think you mean single request per process here.
> 
> 
>> Multiple requests per process:
>> * Asynchronous (implied to be single-threaded) (Twisted, Medusa, 
>> CherryPy, BaseHTTPServer)
>> * Threaded (Zope, Webware, CherryPy with different settings)
> 
> 
> Actually, both Twisted and Zope's ZServer use a "async dispatcher in the 
> main thread, requests can be processed in worker threads" model.
> 
> BaseHTTPServer isn't asynchronous, either, and with mixins can be 
> threaded or forking.
> 
> Regarding most of the others you mention, I'm not knowledgeable enough 
> to comment.
> 
> 
>> Most asynch environments can be turned into threaded systems after 
>> runCGI.  Webware, at least, is threaded at the point runCGI is called 
>> (maybe to its detriment), but many systems are not (including Zope, I 
>> think, and probably CherryPy).
> 
> 
> I actually don't understand what you mean here.  But I'll try and tackle 
> definitions for this in a subsequent PEP draft.

Well, what you were referring to just above.  In an LRSP-single process 
you can always just spawn a thread, and turn it into an LRSP-multi 
environment.  So the distinction is a little vague.  Applications that 
are typically threaded, like Zope, may not be threaded until after the 
gateway.

>> I believe most other frameworks are built on mod_python, CGI, or 
>> FastCGI so they are covered under these categories.  I think there 
>> might be a separate threaded model for quixote, but I don't know if 
>> that portion has its own name.  I'm not sure I understand FastCGI well 
>> enough to classify it.
> 
> 
> A quick attempt to clarify what concepts we're dealing with...
 >
> * A "web server" is something that accepts HTTP connections
 >
> * A "gateway protocol" connects a "web server" to a "gateway"
 >
> * A "gateway protocol" may be in-process (e.g. if the server is written 
> in Python or embeds Python) or use some kind of inter-process 
> communication (pipes for CGI, sockets for FastCGI, etc.)
> 
> * If the gateway protocol is in-process, then the process model for the 
> app is limited by the process model of the web server.
> 
> * If the gateway protocol is interprocess, then the process model for 
> the app is determined by the process model of the gateway implementation.
> 
> * The basic process models for a server or gateway are:
> 
>   - preforking, serially reused processes  (e.g. mod_python, PEAK's 
> multiprocess FastCGI runner, etc.)
> 
>   - "long running single process" (LRSP) (e.g. Twisted, ZServer, 
> WSGIServer, any FastCGI runner under Apache if Apache is configured with 
> maxClassProcesses=1, maybe AOLServer too?)
> 
>     + with threads
> 
>     + without threads
> 
>   - fork-on-demand, die-after-one-request (CGI)
> 
> Notice that the server's process model need not be the same as the 
> gateway/container's process model, if the gateway protocol is 
> interprocess.  Indeed, with Apache as the server, you can use any of the 
> process models simply by selecting an appropriate gateway and gateway 
> protocol.
> 
> Anyway, I think I've covered everything possible, except for maybe the 
> idea of using multiple threads in multiple processes, which makes my 
> head hurt.  :)

I think any circumstance where you have more processes/threads than you 
have requests doesn't need to be taken into account.

> So, for short, I guess I'd call the process models "prefork", 
> LRSP-single, LRSP-multi, and fork-and-die (FAD? SRMP?).  Those are just 
> working terms for discussion, the PEP should of course use their full 
> names/descriptions.

I assume LRSP-single is async, and LRSP-multi is threaded?

> The most complicated one from a configuration point of view (IMO) is 
> LRSP-multi.  I don't have much experience with developing in that 
> environment, so it would be helpful if those who have could offer some 
> thoughts.  The main options I'm aware of are:
> 
> * Gateway gets factory, instantiates service instance per worker thread, 
> or on demand within configured parameters.  (Here, the gateway drives 
> how many service instances there are.)

The reusability of the service also comes into effect here -- i.e., 
services may not be threadsafe, but are reusable.  This avoids much 
performance problem with recreating objects, but doesn't require 
threadsafety.

> * Gateway gets a single service, that it calls from many threads.  
> Service handles everything on that side of the fence.  So here, the app 
> side controls its threading.

This options seems more likely at the service level (but maybe not the 
resource level, which we aren't touching in this proposal).

> I think that Twisted and ZServer may currently lean slightly towards the 
> first model, but the second model seems more "portable" to me, in terms 
> of being doable for multiple frameworks.  In theory, one could perhaps 
> even run an "LRSP-single" app in an "LRSP-multi" gateway simply by 
> having one's 'runCGI()' acquire and release a global lock at entry and 
> exit.  It also simplifies things from a container-configuration point of 
> view, as there is only one service object to keep track of.

Yes, I think nested services make the most sense here, where a single 
service is called from multiple threads, then dispatches from there.  It 
can query its sub-service in whatever adhoc way that is necessary, to 
determine whether it an object needs to be instantiated, or can be reused.

And, perhaps gateways should be encouraged to implement only 
LRSP-single, and again allow for a threaded service that spawns threads 
and calls a subservice.

While the LRSP-single app could run in LRSP-multi with a lock, this 
seems unlikely to work well...?  Or would it be okay, because it's 
naturally short running...?  I suppose only the preforking model 
wouldn't work in the LRSP, since it's likely to be both blocking and not 
safe for concurrent use in a single process (at least typically).

I haven't done async much, so that seems more confusing to me.  At some 
point you need to return control, most likely before you have completed 
the request, and I'm not clear if there are well-defined protocols for 
this.  But I don't really know much about it.

With all this talk of services that are also gateways, it makes me 
wonder if we should make that idea more explicit, of various levels of 
delegation.

But then, while it seems like an elegant way to implement a system 
(chaining components), it would be a total pain to configure such nested 
systems.  So... either all the more reason to avoid configuration, or 
these ideas should be collapsed to make them easier to understand for 
end users (i.e., the system administrator-like people who set up the 
software).

   Ian