CGIs and file exclusion

Sat Nov 6 10:51:28 EST 2004

Tim Peters <tim.peters at gmail.com> wrote in message news:<mailman.5991.1099723629.5135.python-list at python.org>...

> What you got above is
> a "write conflict error", and is normal behavior.  What happens:
> 
> - Process A loads revision n of some particular object O.
> - Process B loads the same revision n of O.
> - Process A modifies O, creating revision n+1.
> - Process A commits its change to O.  Revsion n+1 is then current.
> - Process B modifies O, creating revision n+2.
> - Process B *tries* to commit its change to O.
> 
> The implementation of commit() investigates, and effectively says
> "Hmm.  Process B started with revision n of O, but revision n+1 is
> currently committed.  That means B didn't *start* with the currently
> committed revision of O, so B has no idea what might have happened in
> revision n+1 -- B may be trying to commit an insane change as a
> result.  Can't let that happen, so I'll raise ConflictError".  That
> line of argument makes a lot more sense if more than one object is
> involved, but maybe it's enough to hint at the possible problems.
> 
> Anyway, since your store() method always picks on the root object,
> you're going to get ConflictErrors frequently.  It's bad application
> design for a ZODB/ZEO app to have a "hot spot" like that.
> 
> In real life, all ZEO apps, and all multithreaded ZODB apps, always do
> their work inside try/except structures.  When a conflict error
> occurs, the except clause catches it, and generally tries the
> transaction again.  In your code above, that isn't going to work well,
> because there's a single object that's modified by every transaction
> -- it will be rare for a commit() attempt not to give up with a
> conflict error.
> 
> Perhaps paradoxically, it can be easier to get a real ZEO app working
> well than one's first overly simple attempts -- ZODB effectively
> *wants* you to scribble all over the database.

Ok, I understand what you are saying, but I do not understand how would I
solve the problem. This is interesting to me since it has to do with a real
application I am working on. Maybe I should give the framework.

We have an application where the users can interact with the system via
a Web interface (developed in Zope/Plone by other people) and via email. I am
doing the email part. We want the email part to be independent from the
Zope part, since it must act also as a safety belt (i.e. even if the Zope 
server is down for any reason the email part must continue to work).

Moreover, people with slow connections can prefer the email interface
over the Zope/Plone interface which is pretty heavyweight. So, it must be
there. We do expect to have few emails coming in (<100 per hour) so I just 
modified /etc/aliases and each mail is piped to a simple Python script which 
parses it and stores the relevant information (who sent the email, the date, 
the content, etc.).

Input coming via email or via the web interface should go into
the same database. Since we are using Zope anyway and there is no
much writing to do, we thought to use the ZODB and actually ZEO to
keep it independent from the main Zope instance. We could use another
database if needed, but we would prefer to avoid additional dependencies
and installation issues.

The problem is that occasionally two emails (or an email and a web submission)
can arrive at the same time. At the moment I just catch the error and send
back an email saying "Sorry, there was an internal error. Please retry later".
This is rare but it happened during the testing phase. I would rather avoid
that. I thought about catching the exception and waiting a bit before retrying,
but I am not completely happy with that; I also tried a hand-coded solution 
involving a lock file but if was not 100% reliable. So I ask here if the ZODB 
has some smart way to solve that, or if it is possible to change the design in 
such a way to avoid those concurrency issues as much as possible.

Another concern of mine is security. What happens if a maliciuous user
sends 10000 emails at the same time? Does the mail server (can be postfix or
exim4) spawn tons of processes until we run out of memory and the server
crashes? How would I avoid that? I can think of various hackish solutions but I
would like something reliable.

Any hints? Suggestions?

Thanks,

  Michele Simionato