[Web-SIG] Web Container Interface

Thu Jan 29 00:13:20 EST 2004

On Jan 28, 2004, at 5:45 PM, Phillip J. Eby wrote:
> At 04:39 PM 1/28/04 -0600, Ian Bicking wrote:
>>>> In a long-running model, the framework also needs to know when the 
>>>> environment is shutting down (so it can write data out to disk).
>>>> Maybe atexit would be sufficient, I'm not sure (it's not what we 
>>>> use now).
>>>
>>> And what do you do now when somebody does a "kill -9" on the 
>>> process, or the machine reboots?  Python doesn't even guarantee that 
>>> all objects in a process will be finalized during a *normal* exit, 
>>> so how can any Python container guarantee a finalization notice?  
>>> I'd rather we didn't promise what isn't deliverable, or anything 
>>> that starts blurring responsibilities between container and service.
>>
>> We can't guarantee anything, but at least we can give the application 
>> a fighting chance.
>
> -1.  Keeping its data safe is the application's responsibility.  
> Blurring the responsibility over into the container doesn't help, it 
> *hurts*.
>
> This is a difference between defining an interface, or language-level 
> issue, and a framework.  If this were a framework, I'd certainly want 
> to provide services for something like this, should the issue be 
> within scope for the framework.  But WSGI isn't a framework, it's an 
> interface.  And interfaces should 1) clearly delineate 
> responsibilities between the sides of the interface, and 2) not be 
> vague about what you can or can't express.  "We'll try" is something 
> to be avoided in interfaces, because it doesn't *mean* anything.  
> Instead, it encourages app writers to assume that it's taken care of, 
> burdens gateway authors with additional functionality that they'll 
> boilerplate in, and generally makes a mess of portability.

So what alternative do you propose for handling a shutdown?  The 
application *needs* to know about this.  I don't think I trust atexit 
(though I'm open to it if it really would work).  Also, if the 
application spawns threads that will simply block shutdown unless they 
can be told to stop.

I don't care about blurring of responsibilities nearly as much as 
utility.  You want to convert other frameworks, well then you have to 
convert their functionality.  Spawned threads in the application exist. 
  Resources (threads included) that have to be explicitly cleaned up on 
shutdown exist.  If the gateway is the master process, then it has to 
handle the application's requirements.  If you have another way to deal 
with these, fine, bring it forth -- if not, then this proposal won't 
work.

I'm not trying to be difficult, I'm just trying to envision how I would 
adapt Webware's gateway and application (AppServer and Application) to 
this interface.  I don't think of Webware as being particularly 
featureful, so I'm surprised other people haven't seen these problems 
either, unless there are solutions that I'm missing.

The CGI protocol already passes through gateway information, in 
SERVER_SOFTWARE.  The client passes through information in User-Agent.  
User-Agent is already used heavily, and Webware does use 
SERVER_SOFTWARE for a couple of things (when IIS acts differently from 
Apache).  It might not be clean, but it gets the job done.  I know we 
use os.name a lot.  This is information applications need, and it is 
against convention to hide that information.

>> Truisms, I say!  Anyway, it's not about guessing.  It's about 
>> hard-coding behavior based on the environment, when it's called for 
>> to solve demonstrable problems.  You don't get OS-independent 
>> programs by hiding the operating system from the language (though 
>> people have tried).  And I don't think you get gateway-independent 
>> applications by hiding the gateway.
>
> That's what configuration is for.  The deployer/integrator should be 
> allowed to control the app's behavior.

The best documentation is when no documentation is needed.  That's my 
truism ;)  When we figure something out, I'd rather put that knowledge 
into code, instead of documentation.  And every piece of configuration 
requires documentation (and it doesn't even save you any code).

>>> If the app or framework can choose its behaviors, let it make those 
>>> options explicit, as part of its configuration.
>>
>> Configuration sucks!  If the application is not behaving properly in 
>> its environment, it's a bug.  This is open source (at least, every 
>> implementation I care about will be), if you can't get the upstream 
>> to fix the bug, you can always fix it yourself.
>>
>> That may not be true with horribly bloated or close-source software, 
>> but I don't think we should use bad experiences with that sort of 
>> software to color our vision here.
>
> I guess we'll have to agree to disagree on this.  In my view, an 
> application that does not permit explicit configuration for 
> compatibility with a custom environment is unsuitable for deployment 
> in an enterprise production system.  Also, with respect to open 
> source, please keep in mind that while Python itself is open source, 
> there are lots of Python users who develop closed-source applications 
> with it.

I don't expect glue to be closed source, so I don't think these 
problems should be part of closed source components.  The real 
applications people build, which may be proprietary, should be 
insulated from most of these issues.  (There do exist one or two 
proprietary-source Python web frameworks, but they seem rather obscure, 
and I doubt they are closed-source)

>>>> Most asynch environments can be turned into threaded systems after 
>>>> runCGI.  Webware, at least, is threaded at the point runCGI is 
>>>> called (maybe to its detriment), but many systems are not 
>>>> (including Zope, I think, and probably CherryPy).
>>>
>>> I actually don't understand what you mean here.  But I'll try and 
>>> tackle definitions for this in a subsequent PEP draft.
>>
>> Well, what you were referring to just above.  In an LRSP-single 
>> process you can always just spawn a thread, and turn it into an 
>> LRSP-multi environment.  So the distinction is a little vague.
>
> Nope.  An LRSP-single process by definition handles only one request 
> at any moment in time, so spawning a thread doesn't change that.

Now I'm confused.  If it's a single process, and handles only one 
request, isn't that just broken?  I don't know of any example of such a 
server, since it wouldn't be able to handle concurrent requests.

>> I assume LRSP-single is async, and LRSP-multi is threaded?
>
> LRSP-single means only one thread.  LRSP-multi means multi-threaded.  
> Asynchronousness actually implies LRSP-multi, because if you're doing 
> an asynchronous event loop the only way you can afford to call a 
> blocking 'runCGI()' is to do it in a thread.  Twisted and ZServer are 
> asynchronous LRSP-multi.

Okay.  This seems to mean that Twisted wouldn't use this interface 
internally, since they don't want to unnecessarily spawn a thread, and 
the interface doesn't seem to allow for a non-blocking API.

OTOH, to support async applications we'd have to standardize the model, 
I suppose, and there's several models of passing around control, right? 
  (Deferred, callbacks, etc)  So maybe that's too hard.

> By contrast, synchronous servers can be LRSP-single or LRSP-multi.  
> For example, BaseHTTPServer is synchronous, and can be single or 
> multi-threaded.

But BaseHTTPServer without threads is kind of a silly thing, right?  
You can play with it, but not really use it for anything real.

>> While the LRSP-single app could run in LRSP-multi with a lock, this 
>> seems unlikely to work well...?  Or would it be okay, because it's 
>> naturally short running...?  I suppose only the preforking model 
>> wouldn't work in the LRSP, since it's likely to be both blocking and 
>> not safe for concurrent use in a single process (at least typically).
>
> I'm not sure that we should describe applications themselves in terms 
> of the process model as such.  That is, I think we might refer to a 
> "threadable" application as one that may be run in LRSP-threaded, and 
> "multiprocess safe" as an application that does not require a single 
> process.  These dimensions don't quite map onto the four process 
> models, but rather prescribe what models are *not* usable with that 
> app.  That is, an application that *isn't* threadable can't run in 
> LRSP-multi, and an application that *isn't* multiprocess-safe can't 
> run in prefork or fork-and-die.  (i.e., it can only run in one of the 
> LRSP models.)
>
>
>
>> I haven't done async much, so that seems more confusing to me.  At 
>> some point you need to return control, most likely before you have 
>> completed the request, and I'm not clear if there are well-defined 
>> protocols for this.  But I don't really know much about it.
>
> WSGI is intentionally synchronous; you need LRSP-multi (i.e. threads) 
> to run it in an asynchronous web server.  I don't know if this is much 
> of a problem in practice, but I know that both Twisted and ZServer 
> support LRSP-multi in their asynchronous servers.
>
>
>> With all this talk of services that are also gateways, it makes me 
>> wonder if we should make that idea more explicit, of various levels 
>> of delegation.
>
> Yeah, there's still not even a good consensus as to what to call 
> either side of the interface.
>
> One thing that amazes me about this whole discussion is how something 
> so incredibly simple can become so complicated as soon as you have to 
> explain it to somebody else.  :)
>
>
>> But then, while it seems like an elegant way to implement a system 
>> (chaining components), it would be a total pain to configure such 
>> nested systems.  So... either all the more reason to avoid 
>> configuration, or these ideas should be collapsed to make them easier 
>> to understand for end users (i.e., the system administrator-like 
>> people who set up the software).
>
> I think that most chaining will take place on the "application" side, 
> not the "container" side.  By that I mean I would expect an 
> application to be packaged as a single "service", even if internally 
> it's composed of routers and adapters and who knows what.
>
> Of course, if you then want to integrate that app with others to be 
> deployed within the same container, then you as the application 
> integrator are bundling them together into a new, higher-level 
> "application".

I was thinking about this, as in my head I went down a 
what-if-everything-was-a-filter approach, where each feature was a step 
in a chain of these runCGI calls.  Anyway, it works to a point, but the 
opaqueness of stdout/stdin/environ made me realize it would fall down 
long before you could get to any specific code.

There are a lot of useful transformations that couldn't be easily 
determined from stdin/environ, and a lot of output transformations that 
would be difficult to apply to stdout.  Well, they all could be 
implemented, but that involves constant construction and deconstruction 
of various pieces of the request, and the construction of faux-stdouts 
that can be pulled apart and also reconstructed.

> In other words, I don't expect this to be much of a problem in 
> practice, because whoever's dealing with a given integration level is 
> unlikely to deal with any components "below" the level they're 
> integrating them at.  Does that make sense?

Yes -- the application (in whatever form), comes as one object, and 
that object may reference others, or it may not.  As long as you don't 
have to instantiate different chains of gateways depending on the 
combination of terminal gateways and applications (like if the gateways 
adapted the semantics slightly to make the two compatible).  But that 
should probably be avoided, which is to say chained gateways should be 
avoided if possible.  It's really more of a clever way to implement 
things, than a useful way to distribute reusable components.

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org