[Persistence-sig] "Straw Baby" Persistence API

Mon, 22 Jul 2002 13:16:46 -0400

Phillip J. Eby wrote:
> Following on the unparalleled success of the "Straw Man" transaction API
> (he said, with tongue in cheek),

It seemed pretty sucessful to me.

 > I thought it might be good to make a
> proposal for persistence as well. 

Thanks. This is very helpful.

 > Since I won't be at the BOF,

We'll miss you.

 > I figure I
> should get my two cents in now, while the getting's good.
> 
> Here's my proposal, such as it is... Deliver a Persistence package based on
> the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the
> following changes:
> 
> * Remove the BTrees subpackage, and the Class, Cache, Function, and Module
> modules, along with the ICache interface.  Rationale: The BTrees package is
> only useful for a relatively small subset of possible persistence backends,
> and is subject to periodic data structure changes which affect applications
> using it. 

I'm OK with taking out BTrees, however, BTrees were included in ZODB by
very popular demand.

You haven't given a rational for not including the caching framework.
The caching framework is closely ties to persistence and, I think,
largely independent of data managers.

 > It's probably best kept out of the Python core.  Similar
> arguments apply to the Cache system, although not quite as strongly.
> Class, Function, and Module are very recent developments which have not had
> the extended usage that most of the rest of the code has. 

Fair enough.

 > (Note: I don't
> mean to say that the persistence C code has been thoroughly exercised
> either, in the sense that much of it is completely new for Python 2.2.  But
> its *design* has a long history, and previous implementations have had much
> testing of the kind of edge and corner issues that the Class, Function, and
> Module modules haven't been exposed to yet.)
> 
> * I do think we should keep PersistentList and PersistentMapping in the
> core; they're useful for almost any kind of application, and any kind of
> back-end storage.  They don't introduce policy or data format dependencies
> into users' code, either.

I *never* use persistent list and almost never use persistent mapping.
I find BTrees far more useful. :)

> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
> issued a deprecation warning.  I don't personally have a problem with
> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
> confusing or that they want to get rid of it.  So if we're doing it, now
> seems like the time.

I wouldn't worry about backward compatability. Ditch '_p_jar' and pick
a better name, like '_p_manager' as you suggested.

> * Flag _p_changed *after* __setattr__, not before!  This will help
> co-operative transaction participants play nicely together, since they
> can't "write through" a change if they're getting notified *before* the
> change takes place! 

It would be helpful if you could provide an illustrative example in a separate
dedicated message.

 > Docs should also clarify that when set in other code,
> _p_changed should be set at the latest possible moment, *after* the object
> is in its new, stable state.

I'm with Guido in wanting a set of api calls to replace the baroque
'_p_changed' semantics.

Note to both you and Guido, you (Phillip) are right, _p_state is an internal
implementation detail.

> * Keep the _p_atime slot, but don't fill it with anything by default.
> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
> at C level that's called after the getattribute completes.  A data manager
> can then set the hook to point to a _p_atime update function, *or* it can
> introduce postprocessing for "proxy" attributes.  That is, a data manager
> could set the hook to handle "lazy" loading of certain attributes which
> would otherwise be costly to retrieve, by placing a dummy value in the
> object's dictionary, and then having the post-call hook return a
> replacement value.

I suggest we step back a bit and think of the API in terms of events.
I suggest we think about what events are generated and who they are
sent to. Your API change is consistent with that,

> For speed, this will generally want to be a C function; let the base
> package include a simple hook that updates _p_atime, and another which
> checks whether the retrievedValue is an instance of a LazyValue base class,
> and if so, calls the object.  This will probably cover the basics.  A data
> manager that uses ZODB caching will use the atime function, and non-ZODB
> data managers will probably want the other hook.  I also have an idea about
> using the transaction's timestamp() plus a counter to supply a "time" value
> that minimizes system calls, but I'm not sure it would actually improve
> performance any, so I'm fine with not trying to push that into the initial
> package.  As long as the hook slot is present in the base package, I or
> anyone else are free to make up and try our own hooks to put in it.

I'd like to get rid of _p_atime, as it is totally dependent on a particular
cache implementation, which we happen to be phasing out.

Persistent objects should have *no*

> * Get rid of the term "register", since objects won't "register" with the
> transaction, and neither should they with their data manager.  They should
> "inform their data manager" that they have changed.  Something like an
> objectChanged() message is appropriate in place of register().  I believe
> this would clarify the API.

That's fine.

> * Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
> way such that it works whether you have Interface or not", but the reality
> is that a dependency in the standard library on something outside the
> standard library is a big no-no, and just begging for breakage as soon as
> there *is* an Interface package (with a new API) in the standard library.

I think that this is a very bad idea. I think the interfaces clarify things
quite a bit.

> Whew!  I think that about covers it, as far as what I'd like to see, and
> what I think would be needed to make it acceptable for the core.  Comments?
> 
> By the way, my rationale for not taking any radical new approaches to
> persistence, observation, or notification in this proposal is that the
> existing Persistence package is "transparent" enough, and has the benefit
> of lots of field experience.  I spent a lot of time trying to come up with
> "better" ways before writing this; mostly I found that trying to make it
> more "transparent" to the object being persisted, just pushes the
> complexity into either the app or the backend, without really helping
> anything.  It's not a really big deal to:
> 
> 1. Subclass Persistent
> 
> 2. Use PersistentList and PersistentMapping or other Persistent objects for
> your attributes, or set self._p_changed when you change a non-persistent
> mutable.

These are not a big deal to you, because you have a deep understanding and
interest in the machinery. They are a big deal to most people. It would
be *wonderful* if we could avoid this. Maybe if we had a standard persistence
framework, we could motivate language changes that made this cleaner. :)

> 3. Use transactions
> 
> Especially if that's all you need to do in order to have persistence to any
> number of backends, including the current ZODB and all the wonderful SQL or
> other mappings that will be creatable by everybody on this list using their
> own techniques.  The key is not so much "transparency" per se, as
> *uniformity* across backends.  I think the existing API is transparent
> enough; let's work on having uniform and universal access to it, as a
> Python core package.

Transactions are a huge benefit, as opposed to something that is "not
really a big deal". :)

Here are some additional points:

- While we should provide a standard implementation of a persistence
   *interface*, we should allow other implementations. For example, the
   data manager or cache should not depend on internal details of the
   persistence implementation. We should not require a specific C layout
   for persistent objects, for example.

- The persistence interface and implementations should be independent of
   the cache implementations (e.g. no _p_atime). We *do* need to provide
   an better API for handling objects that are unwilling to be deactivated.
   Perhaps _p_deactivate should return a value indicating whether the object
   was deactivated, and, if not, perhaps why.

- We need to define the state model for persistent objects. I'd like to include
   the notion of a persistent refcount. Possible states are:

   o Unsaved

   o Up to date

   o changed

   o ghost

   In addition, there is a persistent reference count. This is used by C code
   to indicate that the object is being used outside of Python. An objecty
   can't be turned into a ghost if it's persistent reference count is > 0.
   We'll model the reference count as a "sticky" state. We transition to the sticky
   state when the reference count becomes non-zero and from the sticky state
   when the reference count drops to zero. This state is largely indepent of the other
   states.

- I'd like to spend some time thinking through persistence related events.
   Here's a start:

     o When a persistent object is modified while in the up-to-date state,
       it should notify it's datata manager and transition to the changed state.

     o When the object it accessed, it should notify it's data manager. Perhaps it
       should pass it's current state.

     o The persistent object calls a method on the data manager when it's state
       needs to be loaded.

     o The persistent object should probably notify the data manager of any state
       changes.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org