[Web-SIG] Web Container Interface

Wed Jan 28 18:45:26 EST 2004

At 04:39 PM 1/28/04 -0600, Ian Bicking wrote:
>>>In a long-running model, the framework also needs to know when the 
>>>environment is shutting down (so it can write data out to disk).
>>>Maybe atexit would be sufficient, I'm not sure (it's not what we use now).
>>
>>And what do you do now when somebody does a "kill -9" on the process, or 
>>the machine reboots?  Python doesn't even guarantee that all objects in a 
>>process will be finalized during a *normal* exit, so how can any Python 
>>container guarantee a finalization notice?  I'd rather we didn't promise 
>>what isn't deliverable, or anything that starts blurring responsibilities 
>>between container and service.
>
>We can't guarantee anything, but at least we can give the application a 
>fighting chance.

-1.  Keeping its data safe is the application's responsibility.  Blurring 
the responsibility over into the container doesn't help, it *hurts*.

This is a difference between defining an interface, or language-level 
issue, and a framework.  If this were a framework, I'd certainly want to 
provide services for something like this, should the issue be within scope 
for the framework.  But WSGI isn't a framework, it's an interface.  And 
interfaces should 1) clearly delineate responsibilities between the sides 
of the interface, and 2) not be vague about what you can or can't 
express.  "We'll try" is something to be avoided in interfaces, because it 
doesn't *mean* anything.  Instead, it encourages app writers to assume that 
it's taken care of, burdens gateway authors with additional functionality 
that they'll boilerplate in, and generally makes a mess of portability.

>Truisms, I say!  Anyway, it's not about guessing.  It's about hard-coding 
>behavior based on the environment, when it's called for to solve 
>demonstrable problems.  You don't get OS-independent programs by hiding 
>the operating system from the language (though people have tried).  And I 
>don't think you get gateway-independent applications by hiding the gateway.

That's what configuration is for.  The deployer/integrator should be 
allowed to control the app's behavior.

>>If the app or framework can choose its behaviors, let it make those 
>>options explicit, as part of its configuration.
>
>Configuration sucks!  If the application is not behaving properly in its 
>environment, it's a bug.  This is open source (at least, every 
>implementation I care about will be), if you can't get the upstream to fix 
>the bug, you can always fix it yourself.
>
>That may not be true with horribly bloated or close-source software, but I 
>don't think we should use bad experiences with that sort of software to 
>color our vision here.

I guess we'll have to agree to disagree on this.  In my view, an 
application that does not permit explicit configuration for compatibility 
with a custom environment is unsuitable for deployment in an enterprise 
production system.  Also, with respect to open source, please keep in mind 
that while Python itself is open source, there are lots of Python users who 
develop closed-source applications with it.

>>>Most asynch environments can be turned into threaded systems after 
>>>runCGI.  Webware, at least, is threaded at the point runCGI is called 
>>>(maybe to its detriment), but many systems are not (including Zope, I 
>>>think, and probably CherryPy).
>>
>>I actually don't understand what you mean here.  But I'll try and tackle 
>>definitions for this in a subsequent PEP draft.
>
>Well, what you were referring to just above.  In an LRSP-single process 
>you can always just spawn a thread, and turn it into an LRSP-multi 
>environment.  So the distinction is a little vague.

Nope.  An LRSP-single process by definition handles only one request at any 
moment in time, so spawning a thread doesn't change that.

>I assume LRSP-single is async, and LRSP-multi is threaded?

LRSP-single means only one thread.  LRSP-multi means 
multi-threaded.  Asynchronousness actually implies LRSP-multi, because if 
you're doing an asynchronous event loop the only way you can afford to call 
a blocking 'runCGI()' is to do it in a thread.  Twisted and ZServer are 
asynchronous LRSP-multi.

By contrast, synchronous servers can be LRSP-single or LRSP-multi.  For 
example, BaseHTTPServer is synchronous, and can be single or multi-threaded.

>Yes, I think nested services make the most sense here, where a single 
>service is called from multiple threads, then dispatches from there.  It 
>can query its sub-service in whatever adhoc way that is necessary, to 
>determine whether it an object needs to be instantiated, or can be reused.

I'm really baffled at why things need to be so complicated, but as long as 
the complexity is kept well away from the interface, I'm happy.  :)

>And, perhaps gateways should be encouraged to implement only LRSP-single, 
>and again allow for a threaded service that spawns threads and calls a 
>subservice.

That would be LRSP-multi, because 'runCGI()' is synchronous.  It returns 
only when the request is finished.  Thus, the *only* way to do multiple 
requests at once is for the *container* to be threaded.  That's an 
intentional feature of the design.

>While the LRSP-single app could run in LRSP-multi with a lock, this seems 
>unlikely to work well...?  Or would it be okay, because it's naturally 
>short running...?  I suppose only the preforking model wouldn't work in 
>the LRSP, since it's likely to be both blocking and not safe for 
>concurrent use in a single process (at least typically).

I'm not sure that we should describe applications themselves in terms of 
the process model as such.  That is, I think we might refer to a 
"threadable" application as one that may be run in LRSP-threaded, and 
"multiprocess safe" as an application that does not require a single 
process.  These dimensions don't quite map onto the four process models, 
but rather prescribe what models are *not* usable with that app.  That is, 
an application that *isn't* threadable can't run in LRSP-multi, and an 
application that *isn't* multiprocess-safe can't run in prefork or 
fork-and-die.  (i.e., it can only run in one of the LRSP models.)

>I haven't done async much, so that seems more confusing to me.  At some 
>point you need to return control, most likely before you have completed 
>the request, and I'm not clear if there are well-defined protocols for 
>this.  But I don't really know much about it.

WSGI is intentionally synchronous; you need LRSP-multi (i.e. threads) to 
run it in an asynchronous web server.  I don't know if this is much of a 
problem in practice, but I know that both Twisted and ZServer support 
LRSP-multi in their asynchronous servers.

>With all this talk of services that are also gateways, it makes me wonder 
>if we should make that idea more explicit, of various levels of delegation.

Yeah, there's still not even a good consensus as to what to call either 
side of the interface.

One thing that amazes me about this whole discussion is how something so 
incredibly simple can become so complicated as soon as you have to explain 
it to somebody else.  :)

>But then, while it seems like an elegant way to implement a system 
>(chaining components), it would be a total pain to configure such nested 
>systems.  So... either all the more reason to avoid configuration, or 
>these ideas should be collapsed to make them easier to understand for end 
>users (i.e., the system administrator-like people who set up the software).

I think that most chaining will take place on the "application" side, not 
the "container" side.  By that I mean I would expect an application to be 
packaged as a single "service", even if internally it's composed of routers 
and adapters and who knows what.

Of course, if you then want to integrate that app with others to be 
deployed within the same container, then you as the application integrator 
are bundling them together into a new, higher-level "application".

In other words, I don't expect this to be much of a problem in practice, 
because whoever's dealing with a given integration level is unlikely to 
deal with any components "below" the level they're integrating them 
at.  Does that make sense?