[Web-SIG] A Python Web Application Package and Format

Eric Larson eric at ionrock.org
Tue Apr 12 01:04:01 CEST 2011


On Apr 11, 2011, at 2:48 PM, Alice Bevan–McGregor wrote:

> On 2011-04-11 00:53:02 -0700, Eric Larson said:
> 
>> Hi,
>> On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote:
>>> However, the package format I describe in that gist does include the source for the dependencies as "snapshotted" during bundling.  If your application is working in development, after snapshotting it /will/ work on sandbox or production deployments.
>> I wanted to chime in on this one aspect b/c I think the concept is somewhat flawed. If your application is working in development and "snapshot" the dependencies that is no guarantee that things will work in production. The only way to say that snapshot or bundle is guaranteed to work is if you snapshot the entire system and make it available as a production system.
> 
> `pwaf bundle` bundles the source tarballs, effectively, of your application and dependencies into a single file.  Not unlike a certain feature of pip.
> 
> And… wait, am I the only one who uses built-from-snapshot virtual servers for sandbox and production deployment?  I can't be the only one who likes things to work as expected.
> 
>> Using a real world example, say you develop your application on OS X and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two different operating systems with entirely different system calls. If you use something like lxml and simplejson, you have no choice but to repackage or install from source on the production server.
> 
> Installing from source is what I was suggesting.  Also, Ubuntu on a server?  All your `linux single` (root) are belong to me.  ;^P
> 

I realize your intent was to install from source and I'm saying that is the problem. Not from the standpoint of a Python web application of course. But instead, from the standpoint of a Python web application that is working within the context of a larger system. A sandbox is nice b/c it gives you a place to do whatever you want and be somewhat oblivious of the rest of the world. My point is not that its incorrect to install Python packages from source, but assuming that all dependencies should be installed from source is flawed. Just b/c a C library needs some library to compile, it doesn't meant that the same library is necessary to run. It is generally a good idea to keep compilers off of production machines. 

>> While it is fair to say that generally you could avoid packages that don't use C, both lxml and simplejson are rather obvious choices for web development.
> 
> Except that json is built-in in 2.6 (admittedly with fewer features, but I've never needed the extras) and there are alternate xml parsers, too.
> 

Ok, you are correct that there are other parsers and that the json module is builtin. But we've already made a conscious decision to use lxml and simplejson instead of other tools (including the json module) because they are slower. These compiled packages have been very frustrating to deal with in production because they need to be compiled on the server. Along similar lines, we have our own Python apps that use C and these are similarly very difficult to deploy. This is because our deployment system is built off of setuptools and eggs (no zip). This is generally not a bad thing and speaks to the quality of Python as a platform. But, the pain of having a very Python centric system is substantial. My point is that we recognize that while it is very convenient to install Python packages and let pip (and setuptools) handle our dependencies, it also doesn't allow a way to interact with the host system that is housing our sandbox. 

>> It sounds like Ian doesn't want to have any build steps which I think is a bad mantra. A build step lets you prepare things for deployment. A deployment package is different than a development package and mixing the two by forcing builds on the server or seems like asking for trouble.
> 
> I'm having difficulty following this statement: build steps good, building on server bad?  So I take it you know the exact target architecture and have cross-compilers installed in your development environment?  That's not practical (or simple) at all!
> 

I'd think it is pretty bad practice to release software to production machines with no assumptions made about that target machine. 

It doesn't have to be impractical. All it takes is an acknowledgement that the system might need to supply some requirement and state that requirement in a way that makes sense for your system. That is it. A list of package names that are downloadable via some system level package manager might be more than enough. URLs to source packages might be fine. The idea is that we as Python application developers can make the lives of others who work with the system easier by providing a mechanism for communicating system level dependencies. 

>> I'm not saying this is what you (Alice) are suggesting, but rather pointing out that as a model, depending on virtualenv + pip's bundling capabilities seems slightly flawed.
> 
> Virtualenv (or something utilizing a similar Python path 'chrooting' capability) and pip using the extracted "deps" as the source for "offline" installation actually seems quite reasonable to me.  The benefit of a known set of working packages (i.e. specific version numbers, tested in development) and the ability to compile C extensions in-place.  (Because sure as hell you can't reliably compile them before-hand if they have any form of system library dependency!)
> 

I understand that this is not always that easy, so I agree it is not something I would prescribe out of the gate. But I would make the system agnostic to whether or not you have to compile things on the server or not. Operating system vendors have all conquered the problem of releasing software to machines with a much larger variety then you'll ever see in a single production environment. It isn't impossible or that difficult to an idea to support. That said, I'm not suggesting creating the tools or having the requirement to deliver pre-built binary Python modules. Instead my point is to make sure it is possible and supported as a requirement. 

>> I think it should offer hooks for running tests, learning basic status and allow simple configuration for typical sysadmin needs (logging via syslog, process management, nagios checks, etc.). Instead of focusing on what format that should take in terms of packages, it seems more effective to spend time defining a standard means of managing WSGI apps and piggyback or plain old copy some format like RPMs or dpkg.
> 
> RPMs are terrible, dpkg is terrible.  Binary package distribution, in general, is terrible.  I got the distinct impression at PyCon that binary distributable .eggs were thought of as terrible and should be phased out.
> 

RPMs and dpkg are both just tar files. You untar the at the root of the file system and the files in the tar are "installed" in the correct place on the file system. Pip does the same basic thing with the exception being you are untarring in $prefix/lib/ instead. I think that model is excellent. I said to copy it if need be. My only point is to realize that you are installing the package in a guest sandbox. Include some facility to communicate how the system might need to meet some dependencies. 

> Also, nobody so far seems to have noticed the centralized logging management or deamon management lines from my notes.
> 
>> Just my .02. Again, I haven't offered code, so feel free to ignore me. But I do hope that if there are others that suspect this model of putting source on the server is a problem pipe up. If I were to add a requirement it would be that Python web applications help system administrators become more effective. That means finding consistent ways of deploying apps that plays well with other languages / platforms. After all, keeping a C compiler on a public server is rarely a good idea.
> 
> If you could demonstrate a fool-proof way to install packages with system library dependencies using cross-compilation from a remote machine, I'm all ears.  ;)
> 

pre-install-hooks: [
  "apt-get install libxml2",  # the person deploying the package assumes apt-get is available
  "run-some-shell-script.sh", # the shell script might do the following on a list of URLs
  "wget http://mydomain.com/canonical/repo/dependency.tar.gz && tar zxf dependency.tar.gz && rm dependency.tar.gz"
]

Does that make some sense? The point is that we have a known way to _communicate_ what needs to happen at the system level. I agree that there isn't a fool proof way. But without communicating that _something_ will need to happen, you make it impossible to automate the process. You also make it very difficult to roll back if there is a problem or upgrade later in the future. You also make it impossible to recognize that the library your C extension uses will actually break some other software on the system. Sure you could use virtual machines, but if we don't want to tie ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, Xen or any of the other hypervisors and cloud vendors? 

I hope I've made my point clearer. The idea is not to implement everything but just as setuptools has provided helpful hooks like entry points that help facilitate functionality, I'm suggesting that if this idea moves forward, similar hooks are available to help facilitate the host systems that will house our sandboxes. 

Eric


> 	— Alice.
> 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110411/d177c3fa/attachment-0001.html>


More information about the Web-SIG mailing list