Object Persistence in CGI (fwd)

Thu May 20 12:45:25 EDT 1999

I have had a heck of a time getting this below post to the newsgroup.  
If other folks have already seen this, would you email me and complain 
(but I cannot find my own attempts in either my newsfeed, or in
dejanews).

-----------------------------------------------------------------------
I am new to Python, but not to programming generally.  Despite having
only written not much more than "hello world" practice apps, I've read
enough about Python (in this newsgroup and elsewhere) that I've decided
to create a web-based vertical application I am writing in Python.

The general idea of this application is that it will have a set of
authorized users, secured by Basic User Authentication.  Most users will
have access to a set of information concerning themselves specifically.
A smaller number of users will have general access to other users'
information, including for purposes of producing summaries of users'
status.  I will have to set up some sort of lookup against "super-user"
lists to check whether an operation on another user is permissible.  But
in most cases, the operation will just pull REMOTE_USER, and perform an
operation against that user's data (Apache authorization lists will use
the identical usernames as used in the Python CGIs).

Each user will have her own object instance, built by composition in a
fixed hierarchy.  In general, I want to use Pythons shelve/pickle
functions to store and retrieve these user objects.  Most of the time
users will read their objects, which will form the basis of various HTML
pages that present portions of this data.  Sometimes, however, users
will complete HTML forms that will instruct changes to their object
data; and this revised object will have to be stored back to disk.  Each
user object will be fairly small, on the order of a few K, but varying
between users.

The question I have here is about the best approach to storing this
persistent data.  Several concerns come to mind:

    1. Concurrency.  Most of the time, each user has her own object,
       but occassionally multiple users need to be able to read and/or
       modify the same user object.  If I use a dbm/shelve, the
       concurrency issue arises even for users (write-)accessing
       different objects.  If I create seperate files for each user
       object, there is less problem here.

    2. Efficiency.  There are two speed problems I see as possible.  If
       I use a simple file locking mechanism against a shelve (on
       writes), that could keep the file locked enough to cause delays
       for other users (especially as different CGI calls needed to open
       and close the same dbm).  The second problem pulls in a slightly
       different direction.  For users creating summaries, it will be
       necessary to read a whole sequence of objects, extracting
       information from each.  I suspect a shelve (or SQL) would be
       quicker than accessing a large collection of files.

    3. Ease-of-implementation.  I expect to implement this application
       on a co-hosted server, using a "web-appliance".  This
       web-appliance is called "Cobalt Raq", which is based on a MIPS
       chip, with Linux/Apache (I think 2.0 kernel and 1.3.3).  Zope's
       object database seems to answer some of my other concerns, but I
       am concerned that it would be difficult to get up-and-running in
       this environment that I am not-too-familiar-with to start with.
       Maybe the same problem with something like MySQL.  On the other
       hand, using shelve is "free" once I have Python running.

    4. Reliability.  I will be implementing this application for a
       client or clients who consider their data important.  I will
       certainly arrange a backup schedule of the data, but even so,
       anything that could wind up with corrupted files would be very
       bad... and more corruption is worse than less, if it does happen
       (i.e. seperate user files are likely to keep any corruption to
       isolated users).

    5. Speed.  I do not want to hit a server bottleneck in delivering
       pages.  CGI is not acclaimed for its huge efficiency, but I am
       not sure if that matters.  My first client might involve up to a
       few hundred simultaneous users, while subsequent clients might
       push this up to the thousand range.  I am not really clear about
       the relative merits and compatibility of options like FastCGI,
       mod_py, or other ways of speeding up Python CGIs. I am also not
       sure about the relative speed issues in the different persistence
       options.

Any comments or thoughts on any of these issues will be much
appreciated.

Yours, Lulu...

--
quilty    _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
@ibm.   _/_/                    Postmodern Enterprises         _/_/  s r
net    _/_/  MAKERS OF CHAOS....                              _/_/   i u
      _/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/    g s