[Web-SIG] A Python Web Application Package and Format

Ian Bicking ianb at colorstudy.com
Thu Apr 14 19:34:59 CEST 2011


I think there's a general concept we should have, which I'll call a "script"
-- but basically it's a script to run (__main__-style), a callable to call
(module:name), or a URL to fetch internally.  I want to keep this distinct
from anything long-running, which is a much more complex deal.  I think
given the three options, and for general simplicity, the script can be
successful or have an error (for Python code: exception or no; for __main__:
zero exit code or no; for a URL: 2xx code or no), and can return some text
(which may only be informational, not structured?)

An application configuration could refer to scripts under different names,
to be invoked at different stages.

On Thu, Apr 14, 2011 at 1:57 AM, Alice Bevan–McGregor
<alice at gothcandy.com>wrote:

> On 2011-04-13 18:16:36 -0700, Ian Bicking said:
>
>  While initially reluctant to use zip files, after further discussion and
>> thought they seem fine to me, so long as any tool that takes a zip file can
>> also take a directory.  The reverse might not be true -- for instance, I'd
>> like a way to install or update a library for (and inside) an application,
>> but I doubt I would make pip rewrite zip files to do this ;)  But it could
>> certainly work on directories.  Supporting both isn't a big deal except that
>> you can't do symlinks in a zip file.
>>
>
> I'm not talking about using zip files as per eggs, where the code is
> maintained within the zip file during execution.  It is merely a packaging
> format with the software itself extracted from the zip during installation /
> upgrade.  A transitory container format.  (Folders in the end.)
>
> Symlinks are an OS-specific feature, so those are out as a core
> requirement.  ;)
>
>
>  I don't think we're talking about something like a buildout recipe.  Well,
>> Eric kind of brought something like that up... but otherwise I think the
>> consensus is in that direction.
>>
>
> Ambiguous statements FTW, but I think I know what you meant.  ;)
>
>
>  So specifically if you need something like lxml the application specifies
>> that somehow, but doesn't specify *how* that library is acquired.  There is
>> some disagreement on whether this is generally true, or only true for
>> libraries that are not portable.
>>
>
> +1
>
> I think something along the lines of autoconf (those lovely ./configure
> scripts you run when building GNU-style software from source) with published
> base 'checkers' (predicates as I referred to them previously) would be
> great.  A clear way for an application to declare a dependency, have the
> application server check those dependencies, then notify the administrator
> installing the package.
>

There could be an optional self-test script, where the application could do
a last self-check -- import whatever it wanted, check db settings, etc.  Of
course we'd want to know what it needed *before* the self-check to try to
provide it, but double-checking is of course good too.

One advantage to a separate script instead of just one script-on-install is
that you can more easily indicate *why* the installation failed.  For
instance, script-on-install might fail because it can't create the database
tables it needs, which is a different kind of error than a library not being
installed, or being fundamentally incompatible with the container it is in.
In some sense maybe that's because we aren't proposing a rich error system
-- but realistically a lot of these errors will be TypeError, ImportError,
etc., and trying to normalize those errors to some richer meaning is
unlikely to be done effectively (especially since error cases are hard to
test, since they are the things you weren't expecting).

I've seen several Python libraries that include the C library code that they
> expose; while not so terribly efficient (i.e. you can't install the C
> library once, then share it amongst venvs), it is effective for small
> packages.
>

Generally compiling seems fairly reliable these days, but it does typically
require more system-level packages be installed (e.g., python-dev).
Actually invoking these installations in an automated and reliable way seems
hard to me.  I find debs/rpms to work well for these cases.  There is some
challenge when you need something that isn't packaged, but in many ways the
work you need to do is always going to be the same work you'd need to do to
package that library or the new version of that library.  So I'm inclined to
ask people to lean on the existing OS-level tooling for dealing with these
libraries.


> Larger (i.e. global or application-local) would require the intervention of
> a systems administrator.
>
>
>  Something like a database takes this a bit further.  We haven't really
>> discussed it, but I think this is where it gets interesting.  Silver Lining
>> has one model for this.  The general rule in Silver Lining is that you can't
>> have anything with persistence without asking for it as a service, including
>> an area to write files (except temporary files?)
>>
>
> +1
>
> Databases are slightly more difficult; an application could ask for:
>
> :: (Very Generic) A PEP-249 database connection.
>
> :: (Generic) A relational database connection string.
>
> :: (Specific) A connection string to a specific vendor of database.
>
> :: (Odd) A NoSQL database connection string.
>
> I've been making heavy use of MongoDB over the last year and a half, but
> AFIK each NoSQL database engine does its own thing API-wise.  (Then there
> are ORMs on top of that, but passing a connection string like
> mysql://user:pass@host/db or mongo://host/db is pretty universal.)
>
> It is my intention to write an application server that is capable of
> creating and securing databases on-the-fly.  This would require fairly
> high-level privileges in the database engine, but would result in far more
> "plug-and-play" configuration.  Obviously when deleting an application you
> will have the opportunity to delete the database and associated user.


Categorizing services seems unnecessary.  I'd like to see maybe an |
operator, and a distinction between required and optional services.  E.g.:

require_service:
    - mysql | postgresql | firebird

Or:

require_service:
    - files
optional_service:
    - mysql | postgresql

And then there's a lot more you could do... which one do you prefer, for
instance.  Or things like GIS extensions to databases are... tricky, as they
are somewhat orthogonal to other aspects of the database (for Silver Lining
I have a postgis service, which does extra setup and installation to give
you a GIS-enabled database).  But we can also just ask that when things get
tricky people just make fancier services (e.g., you could make a "dbapi"
service that itself figured out what specific backend to install).

Tricky things:
- You need something funny like multiple databases.  This is very
service-specific anyway, and there might sometimes need to be a way to
configure the service.  It's also a fairly obscure need.
- You need multiple applications to share data.  This is hard, not sure how
to handle it.  Maybe punt for now.


>
>  I suspect there's some disagreement about how the Python environment gets
>> setup, specifically sys.path and any other application-specific
>> customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in
>> silvercustomize.py, and find it helpful).
>>
>
> Similar to Paste's "here" variable for INI files, having some method of the
> application defining environment variables with base path references would
> be needed.
>

I always assume everything must be relative to the root of the directory.


> I've tossed out my idea of sharing dependencies, BTW, so a simple
> extraction of the zipped application into one package folder (linked in
> using a .pth file) with the dependencies installed into an app-packages
> folder in the path (like site-packages) would be ideal.  At least, for me.
>  ;)
>
>
>  Describing the scope of this, it seems kind of boring.  In, for example,
>> App Engine you do all your setup in your runner -- I find this deeply
>> annoying because it makes the runner the only entry point, and thus makes
>> testing, scripts, etc. hard.
>>
>
> I agree; that's a short-sighted approach to an application container
> format.  There should be some way to advertise a test suite and, for
> example, have the suite run before installation or during upgrade.  (Rolling
> back the upgrade process thus far if there is a failure.)
>
> My shiny end goal would be a form of continuous deployment: a git-based
> application which gets a post-commit notification, pulls the latest, runs
> the tests, rolls back on failure or fully deploys the update on success.
>
>
>  We would start with just WSGI.  Other things could follow, but I don't see
>> any reason to worry about that now.  Maybe we should just punt on aggregate
>> applications now too.  I don't feel like there's anything we would do that
>> would prevent other kinds of runtime models (besides the starting point,
>> container-controlled WSGI), and the places to add support for new things are
>> obvious enough (e.g., something like Silver Lining's platform setting).  I
>> would define a server with accompanying daemon processes as an "aggregate".
>>
>
> Since in my model the application server does not proxy requests to the
> instantiated applications (each running in its own process), I'm not sure
> I'm interpreting what you mean by an aggregate application properly.
>

You mean, the application provides its own HTTP server?  I certainly
wouldn't expect that...?

Anyway, in terms of aggregate, I mean something like a "site" that is made
up of many "applications", and maybe those applications are interdependent
in some fashion.  That adds lots of complications, and though there's lots
of use cases for that I think it's easier to think in terms apps as simpler
building blocks for now.


> If "my" application server managed Nginx or Apache configurations, dispatch
> to applications based on base path would be very easy to do while still
> keeping the applications isolated.


Sure; these would be tool options, and if you set everything up you are
requiring the deployer to invoke the tools correctly to get everything in
place.  Which is a fine starting point before formalizing anything.


>
>  An important distinction to make, I believe, is application concerns and
>> deployment concerns.  For instance, what you do with logging is a deployment
>> concern.  Generating logging messages is of course an application concern.
>> In practice these are often conflated, especially in the case of bespoke
>> applications where the only person deploying the application is the person
>> (or team) developing the application.  It shouldn't be annoying for these
>> users, though.  Maybe it makes sense for people to be able to include
>> tool-specific default settings in an application -- things that could be
>> overridden, but especially for the case when the application is not widely
>> reused it could be useful.  (An example where Silver Lining gets is all
>> backwards is I created a [production] section in app.ini when the very
>> concept of "production" is not meaningful in that context -- but these kind
>> of named profiles would make sense for actual application deployment tools.)
>>
>
> Having an application define default logging levels for different scopes
> would be very useful.  The application server could take those defaults, and
> allow an administrator to modify them or define additional scopes quite
> easily.


Hm... I guess this is an ordering question.  You could import logging and
setup defaults, but that doesn't give the container a chance to overwrite
those defaults.  You could have the container setup logging, then make sure
the app sets defaults only when the container hasn't -- but I'm not sure if
it's easy to use the logging module that way.

Well, maybe that's not hard -- if you have something like silvercustomize.py
that is always imported, and imported fairly early on, then have the
container overwrite logging settings before it *does* anything (e.g., sends
a request) then you should be okay?


>
>  There's actually a kind of layered way of thinking of this:
>>
>> 1. The first, maybe most important part, is how you get a proper Python
>> environment.  That includes sys.path of course, with all the accompanying
>> libraries, but it also includes environment description.
>>
>
> Virtualenv-like, with the application itself linked in via a .pth file (a
> la setup.py develop, allowing inline upgrades via SCM) and dependencies
> extracted from the zip distributable into an app-packages folder a la
> site-packages.
>
> I don't install global Python modules on any of my servers, so the
> --no-site-packages option is somewhat unnecessary for me, but having
> something similar would be useful, too.  Unfortunately, that one feature
> seems to require a lot of additional work.
>
>
>  In Silver Lining there's two stages -- first, set some environmental
>> variables (both general ones like $SILVER_CANONICAL_HOST and
>> service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path proper,
>> then import silvercustomize by which an environment can do any more
>> customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)
>>
>
> Environment variables are typeless (raw strings) and thus less than optimum
> for sharing rich configurations.
>

Rich configurations are problematic in their own ways.  While the
str-key/str-value of os.environ is somewhat limited, I wouldn't want
anything richer than JSON (list, dict, str, numbers, bools).  And then we
have to figure out a place to drop the configuration.  Because we are
configuring the *process*, not a particular application or request handler,
a callable isn't great (unless we expect the callable to drop the config
somewhere and other things to pick it up?)

Host names depend on how the application is mounted, and a single
> application may be mounted to multiple domains or paths, so utilizing the
> front end web server's rewriting capability is probably the best solution
> for that.
>

I found at least giving one valid hostname (and yes, should include a path)
was important for many applications.  E.g., a bunch of apps have tendencies
to put hostnames in the database.



> What about multiple database connections?  Environment variables are also
> not so good for repeated values.
>
> A /few/ environment variables are a good idea, though:
>
> :: TMPDIR — when don't you need temporary files?
>
> :: APP_CONFIG_PATH — the path to a YAML file containing the real
> configuration.
>

I'm not psyched about pointing to a file, though I guess it could work --
it's another kind of peculiar
drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up.  At least
dropping it directly in os.environ is easy to use directly (many things
allow os.environ interpolation already) and doesn't require any temporary
files.  Maybe there's a middle ground.



> The configuration file would even include a dict-based logging
> configuration routing all messages to the parent app server for final
> delivery, removing the need for per-app logging files, etc.
>
>
>  2. Define some basic generic metadata.  "app_name" being the most obvious
>> one.
>>
>
> The standard Python setup metadata is pretty good:
>
> :: Application title.
> :: Short description.
> :: Long description / documentation.
> :: Author information.
> :: License.
> :: Source information (URL, download URL).
>

Sure.


> :: Application (package) name.
>

This doesn't seem meaningful to me -- there's no need for a one-to-one
mapping between these applications and a particular package.  Unless you
mean some attempt at a unique name that can be used for indexing?


> :: Dependencies.
>

Will require some more discussion, but something like this, sure.


> :: Entry point-style hooks.  (Post-install, pre/post upgrade, pre-removal,
> etc.)
>

Yes; I just made each entry point a top-level setting, instead of embedding
them into another setting.

 Likely others.
>
>
>  3. Define how to get the WSGI app.  This is WSGI specific, but (1) is
>> *not* WSGI specific (it's only Python specific, and would apply well to
>> other platforms)
>>
>
> I could imagine there would be multiple "application types":
>
> :: WSGI application.  Define a package dot-notation entry point to a WSGI
> application factory.
>
> :: Networked daemon.  This would allow deployment of Twisted services, for
> example.  Define a package dot-notation entry point to the 'main' callable.
>

It would also need a way to specify things like what port to run on, public
or private interface, maybe indicate if something like what proxying is
valid (if any), maybe process management parameters, ways to inspect the
process itself (since *maybe* you can't send internal HTTP requests into
it), etc.

Again, there are likely others, but those are the big two.  In both of these
> cases the configuration (loaded automatically) could be passed as a dict to
> the callable.


PHP! ;)  Anyway, personally I'd like to keep in mind the idea of entirely
different platforms, but that's something I'm willing to just personally
keep in the back of my head and leave out of the discussion.  My experience
supporting PHP is that it was easier than I expected.  Obviously all tools
need not support all platforms.


>
>  4. Define some *web specific* metadata, like static files to serve.  This
>> isn't necessarily WSGI or even Python specific (not that we should bend
>> backwards to be agnostic -- but in practice I think we'd have to bend
>> backwards to make it Python-specific).
>>
>
> Explicitly defining the paths to static files is not just a good idea, it's
> The Slaw™.


I'm not personally that happy with how App Engine does it, as an example --
it requires a regex-based dispatch.


>
>  5. Define some lifecycle metadata, like update_fetch.  These are generally
>> commands to invoke.  IMHO these can be ad hoc, but exist in the scope of (1)
>> and a full "environment".  So it's not radically different than anything
>> else the app does, it's just we declare specific times these actions happen.
>>
>
> Script name, dot-notation callable, or URL.  I see those as the 'big three'
> to support.  Using a dot-notation callable has the same benefit as my
> comments to #3.
>
> The URL would be relative to wherever the application is mounted within a
> domain, of course.
>
>
>  6. Define services (or "resources" or whatever -- the name "resource"
>> doesn't make as much sense to me, but that's bike shedding).  These are
>> things the app can't provide for itself, but requires (or perhaps only
>> wants; e.g., an app might be able to use SQLite, but could also use
>> PostgreSQL).  While the list of services will increase over time, without a
>> basic list most apps can't run at all.  We also need a core set as a kind of
>> reference implementation of what a fully-specified service *is*.
>>
>
> I touched on this up above; any DBAPI compliant database or various
> configuration strings.  (I'd implement this as a string-like object with
> accessor properties so you can pass it to SQLAlchemy straight, or dissect it
> to do something custom.)
>

Anything "string-like" or otherwise fancy requires more support libraries
for the application to actually be able to make use of the environment.
Maybe necessary, but it should be done with great reluctance IMHO.

  Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110414/8ee6f09b/attachment-0001.html>


More information about the Web-SIG mailing list