[Persistence-sig] ACID, savepoints, and exceptions (was re: "Straw Man" transaction API)

Sun, 18 Aug 2002 11:46:53 -0400

At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote:
>Last week, I worked out a revised transaction API for user code and
>for data managers.  It's implemented in ZODB4, but is fairly
>preliminary code.  I imagine we'll revise it further, but I'd like to
>describe the changes briefly.

During the past week, I've been writing a TransactionService for PEAK, 
specifically designing it to allow interaction/adaptation to the new ZODB4 
transaction API, and extending it to support the multi-prepare and durable 
subscription models that I need for my applications and framework 
projects.  I believe I've largely been successful, but in the process it 
has highlighted for me some open issues/ambiguities in the ZODB4 
transaction API as it sits right now, relating to error handling and also 
savepoints.

>class IRollback(Interface):
>
>     def rollback():
>         """Rollback changes since savepoint."""
>
>I think the rollback mechanism will work well enough.  Gray and Reuter
>explain that it can be used to simulate a nested transaction
>architecture.  Thus, I think it's a reasonable building block for the
>nested transaction API.

In my API I've standardized on a 'CannotRevertException' when rollback to a 
savepoint is not possible, and added a 'NullSavepoint' object which can be 
returned by an object that has nothing to do on rollback.

An open issue that needs to be addressed, however, is the question of 
rolling back more than once to the same savepoint.  In some ways, it's a 
very handy capability, but I'm not sure which databases support this.  I'm 
therefore inclined to say we should explicitly say that a savepoint can be 
rolled back at most once (since some savepoints may not be able to be 
rolled back).

Another open issue: what happens if a rollback fails?  Is the transaction 
"hosed" at that point?  What if five data managers roll back, and the sixth 
one fails?  This suggests adding a 'canRollback()' method to the interface, 
such that a rollback aggregator can check that its aggregated savepoints 
can actually be rolled back, so that "CannotRevert" errors don't cause the 
transaction to be hosed.  However, the issue of another type of exception 
occurring during rollback still must be addressed.

>I think I'm also in favor of the new abort semantics.  ZODB3 would
>abort the transactions -- call abort() on all the data managers -- if
>an error occurred during a commit.  The new code requires that the
>user do this instead.  I think that's better, because it leaves the
>state of the objects intact if the code wants to analyze what went
>wrong before retrying the transaction.

The interesting question here again is, is the transaction "hosed"?  Should 
there be a flag that says, "you can't do anything to this transaction but 
abort it"?

To put it in broader terms, if *any* exception is thrown during execution 
of a transaction-related method, should we consider the transaction 
unrecoverable?

I'm inclined to say yes, because I can think of too many code paths in both 
my and the ZODB4 transaction code where it becomes nearly impossible to 
guarantee a "clean" state when an exception occurs.  By definition, if code 
called by the transaction system raises an exception, it is announcing that 
it cannot satisfy its contract with the transaction.  Therefore, the 
transaction cannot be certain of satisfying its contract with the 
application for a clean commit.

Another issue here is clean aborts.  If an error is raised by a data 
manager during abort, what should the semantics be?  Older ZODB transaction 
classes wrap every data manager abort call in a try-except that ensures 
that *all* the abort methods get called, even if several of them raise 
errors.   The new ZODB4 transaction API doesn't do this, and thus can fail 
to completely roll back a transaction.

Of course, the tradeoff is that the old code only gave you information 
about the first exception that occurred, and not any of the later 
ones.  Perhaps the answer is to make the transaction keep track of which 
data managers have received which messages, and to require the caller to 
keep 'abort()'-ing until all data managers have been aborted, even if each 
one raises errors?

I don't really know what's "right" here.  If the first data manager's 
failure causes subsequent DM's to fail, what then?  How much retry and 
recovery logic code must somebody put into their application, in order to 
guarantee correctness and recovery?  Isn't that what the transaction API is 
*for*?

I guess my inclination at this point is to think that maybe the transaction 
needs to have some kind of log - not in the 'logging' module sense, but in 
the sense of a list of actions performed and errors occurred.  These errors 
could then be wrapped up in another exception or a return value upon 
completion of operations like abort() and commit().  Then, if somebody 
wants to analyze it, they have all the data.

But I don't believe it makes sense for the application to try to correct 
errors "under the hood" of the transaction.  Data managers should handle 
their own errors, if there's any handling to be done.  Any analysis of the 
errors after the fact is going to be by a human being, to figure out how to 
fix the application or the data managers so they don't do whatever it is in 
the first place, or so that they catch the problem before it becomes an 
error in a commit or abort operation.