[Persistence-sig] "Straw Baby" Persistence API

Phillip J. Eby pje@telecommunity.com
Mon, 22 Jul 2002 21:12:20 -0400


At 03:47 PM 7/22/02 -0400, Jim Fulton wrote:
>Phillip J. Eby wrote:
>
> > But IMHO their use
>>is specific to persistence mechanisms which use "pickle jar"-style or 
>>"shelve"-like primitive databases.  (Primitive in the sense of not 
>>providing any concepts such as indexes or built-in search 
>>capabilities.)  If you have a higher-level mechanism, even one as simple 
>>as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off 
>>using those features of the backend.
>
>I don't agree.

I didn't qualify my statement sufficiently, then.  :)  See below.


>>If this were not true, there'd be no need for any persistence mechanisms 
>>besides ZODB, and we wouldn't be having this conversation.  :)
>
>There are lots of other reasons for a non-ZODB persistent storage
>including:
>
>1) Need to store data in relational databases
>
>    - Because they are trusted
>
>    - because data needs to be accessed from other apps
>
>    - because they may scale better for some apps

Right, and if you're doing it because of the second or third sub-item 
above, you will have little use for BTrees.  AFAICT, the only reason one 
would store a BTree in another BTree would be if you're doing ZODB-type 
things in an SQL db "because they are trusted".

This is part of what I meant by "most often better off using those 
[higher-level] features of the back-end."  Applications which have 
different read/write characteristics and structural/performance 
requirements than content management applications, will generally be *much* 
better off leaving these things to a good back-end, than managing BTrees 
themselves.


>I think that there should, at least, be a standard cache interface.
>It should be possible to develop data managers and caches independently.
>Maybe we could include one or two standard implementations. These could
>provide useful examples for other implementations and, of course, be
>useful in themselves.

Sure.  I personally don't think there's much that you can standardize on in 
a caching API besides which mapping methods one is required to support, 
without getting into policy and use cases.  But I'm probably biased by the 
relative simplicity of my own use cases re: caching, and by my intense 
desire to get an "official" persistence base into the standard library, at 
the expense of any actual persistence *mechanisms* if need be.  I'm going 
to have to write my own mechanism anyway, so again I'm biased.  :)


>>>>* Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
>>>>way such that it works whether you have Interface or not", but the reality
>>>>is that a dependency in the standard library on something outside the
>>>>standard library is a big no-no, and just begging for breakage as soon as
>>>>there *is* an Interface package (with a new API) in the standard library.
>>>
>>>I think that this is a very bad idea. I think the interfaces clarify things
>>>quite a bit.
>>
>>I think maybe I was unclear.  I certainly don't think that the interfaces 
>>should cease to exist, or that they should not exist as 
>>documentation.  I'm referring to their inclusion as operating code, only.
>
>So you don't want them to get imported?

It's not that I care one way or the other.  Honestly, I'd rather see 
Interface end up in the standard library too - at least once the metaclass 
bug is fixed.  :)  But my overriding priority here is a standard for 
Persistence and Transaction bases for eventual inclusion in the standard 
library.

I have many projects which desperately need good persistence and 
transaction frameworks, but I'm between a rock (ZODB 3) and a hard place 
(ZODB 4) right now.  Both have transaction API's that are somewhat 
difficult to work with, and I need some of the things that are in ZODB 4, 
but if ZODB 4 is about to be re-factored...  I'm stuck in the middle with 
code that could end up orphaned.  Even if I go off and write everything I 
need "from scratch" in order to dodge out this dependency, it doesn't help 
me if the eventual standard doesn't match up closely enough with my 
work.  I'm still left with "orphaned" code - sort of like a DB connection 
object created prior to adoption of a DBAPI standard.

Thus, my objective is to keep the shortest possible distance between me and 
a Python community consensus on a base-level transaction and persistence 
API.  I have a fairly limited time window, however, before I will have to 
pick something and do something, regardless of the long-term cost.  :(


>I was mainly refering to the handling of non-persistent mutable
>stumbling block. This is a major stubling block and source of errors
>to most ZODB users.

Yeah, that one really requires metadata, or collaborative properties.  But 
those are things that are also already in PEAK, so again I'm probably 
biased as to how difficult/available they are.

Also, in the SQL world, the solution to non-persistent mutable data is 
actually quite trivial: don't have non-persistent mutable 
data.  :)  Seriously, since a data manager loads an object's state, it can 
*guarantee* that there will be no non-persistent mutable attributes.  (Note 
that if the object replaces a persistent mutable with a non-persistent one, 
that will trigger a change, and the data manager can force it back to a 
persistent mutable when the state goes back to "up to date".)

In the SQL world, a data manager *must* have this sort of schema knowledge 
in order to do its job.  Pickle-driven data managers may have a harder time 
of this, of course, if they lack sufficient schema knowledge to manage 
object state in this fashion.

Then again, perhaps we could solve the problem for pickle-driven databases 
as well, if there were a Python protocol for declaring immutability!  Heck, 
in theory, one could use interface adaptation to transform objects like 
lists into persistent equivalents.  It would only be necessary to do this, 
however, if the object whose state was being loaded didn't declare that it 
handled its own persistence properly.

The performance/space issue of saving extra persistent objects could 
actually be dealt with by having the substituted objects implement only 
observation on behalf of their holder(s), rather than being actual 
persistent objects.


>I agree that this is hard. It's really hard. I wasn't even suggesting
>that we needed to solve this problem. I was merely pointing out that this
>*is* a big deal for a lot of people.

Understood.

Ironically, enough, I think I have stumbled onto another mechanism for 
doing so, above.

Newly created objects and their subobjects won't be observed, of course, 
but that's moot since they have to be referenced from another persistent 
object to get saved at all.  In "rootless" persistence mechanisms (such as 
most SQL databases), the data manager has to explicitly add the object anyhow.

So it seems that all that's needed is sufficient introspection capability 
to distinguish between:

* A persistent object
* An immutable
* An "observed" mutable
* An "unobserved" mutable

With the ability to substitute a suitable observed mutable for an 
unobserved one, when state is loaded or saved.

I'm going to think about this some more...  It seems altogether too easy, 
so I'm sure there's something I'm missing.  Most likely, it's just that the 
devil is in the details...  the specific issues of introspection, 
selection, and substitution are likely to have lots of little gotchas.


>>If our goal is to provide a Python core package for this in a speedy 
>>timeframe -- say this summer -- I think that developing and debugging a 
>>whole new way of doing things like this is probably out of the question.
>
>Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now,
>them maybe later.

I admit I do enjoy trying to solve the problem.  I'm just not optimistic 
about finding a simple solution.  :)


>>Thing is, *we don't have to actually solve this problem*.  If we create a 
>>decent base API/implementation, there's no reason people can't create the 
>>proxies or class-substitution mechanisms on their own, using the base 
>>implementation to do the actual persistence part.  In principle, it 
>>should be possible to create such a mechanism for arbitrary data managers.
>
>True. But maybe someone will think of a way to solve this without proxies
>or alchemy?

Unless you're going to fundamentally alter the Python object model, it's 
not doable.  Python objects by definition get their behavior from their 
type.  To change the behavior, you must either change the type, the type 
pointer in the object, or replace the object with another one.



>>I'd like to rephrase that as being it notifies, *if* it has been 
>>requested to do so by the data manager.  The data manager may decide to 
>>turn on or off such notifications at will.  (In other words, I want my 
>>post-getattr hook function that can modify the result of the getattr, and 
>>I want it removable so I don't continue to pay in performance once all my 
>>state is loaded.)
>
>We need to think some more about this. I'd rather err on the side of
>simple persistent objects and complex data managers.

So would I, which is why I want the hook, so the data manager can provide 
the behavior, rather than building it into the object.  :)


>I'd also like persistent objects to be as lightweight as possible.
>Carrying a bunch of attributes for hooks is worrysome/

Hm.  Well, we're talking C-level slots here, and I only asked for one hook, 
myself.  Guido suggested the setattr hook.  :)  I like lightweight in 
*performance*, and having a callable C function seems lighter in that sense 
than having the object look up an attribute on the data manager every time 
an attribute lookup is performed on it.  Plus, the hook can be stateful, 
while a method on the data manager has to check state - which could require 
a re-entrant attribute lookup back to the object.


>>>     o The persistent object calls a method on the data manager when 
>>> it's state
>>>       needs to be loaded.
>>
>>As long as I still have the ability to set or remove a getattr-hook that 
>>works independently of this, I'm fine.
>
>Would different objects in the same DM have different values of the same hook?

Different values, yes.  Different non-empty values, probably not.  In other 
words, I'm mainly interested in having the hook be "on" or "off" for a 
given data manager.


>If so, why?

I have only one use case for having different non-empty hook values for the 
same DM: polymorphism.    But there are other ways to achieve it, so I 
don't think different non-empty values per DM is a requirement.  I suppose 
you could then implement the hook as a bit flag rather than a hook pointer, 
but it seems to me the performance might be worth using a pointer instead 
of a bit flag.


>A decent cache is going to handle objects differenty based on their states.
>For example, a cache that deactivates objects when they haven't been used in a
>while needs to know which objects are ghostifyable and needs to know when
>ghostifyable objects have changed.

So add "sticky"/"unsticky" messages, and we'd be done.  Or, if "stickiness" 
represents a minority state among ghostable objects, don't even add this, 
because it'd be more efficient for the cache to just ask the object to 
deactivate itself and see what happens, than to send lots of "I'm 
sticky...  whoops, now I'm not" messages to data managers.

With the messages I listed previously, a data manager should have enough 
information.  I'd rather we try to implement some data managers or caches 
and find we need to add something, than add a YAGNI on this one, because 
the performance penalty for unnecessary notifications seems potentially 
high, not to mention the added complexity for data managers to handle a 
bunch of extra messages.


>>I've spent a lot of time hacking around the existing packages to do 
>>SQL/LDAP stuff, and others here should have strong experience using ZODB 
>>for its "natural" backends and application structures.  That means we 
>>should be able to get pretty concrete about what is and isn't needed.
>>In the absence of more use cases, I'm not sure what else is really needed 
>>besides what we've already discussed.  Indeed, most of what I've outlined 
>>has been stuff I think should be taken *out*.
>>To put it another way, I think we should have to justify everything we 
>>want to put *in*, not what we take out.  Python standard library modules 
>>are widely distributed, and have a long life.  Whatever we put in needs 
>>to have a healthy life expectancy!
>
>I don't think we should approach this effort with the assumption that the 
>first
>version is going into the standard library. I'm pretty happy with the 
>persistence
>mechanism I came up with for ZODB, but there are a lot of things I'd like 
>to fix.

As I mentioned above, my primary goal is just to get a consensus for the 
basic interfaces.  I'd be happy if we end up with something like a DBAPI 
PEP that everybody agreed on.  The standard library is gravy, but I *do* 
want to see it there before too terribly long.  IOW, I'd like this to be 
like the XML processing and distutils, which were separately distributed 
for a (Python) release or two as candidates for the standard library, and 
became standard later.