Object Database (ODBMS) for Python

Fri Aug 29 16:24:39 EDT 2003

"Paul D. Fernhout" <pdfernhout at kurtz-fernhout.com> writes:

> Patrick K. O'Brien wrote:
> > Let me start by saying I'd love to cooperate, even if I am
> > competitive by nature.  ;-)
> 
> Nothing like a good controversy to get people paying attention. :-)

And never let the facts get in the way of a good story.  ;-)

> > This API looks rather verbose to me.  I think mine would look like:
> >>>> t = tx.Create('User', name='Sir Galahad') user = db.execute(t)
> 
> I think your notion of transactions is growing on me. :-) I can see how
> you can generalize this to construct a transaction in a view of a
> database, querying on DB + T1 + T2 etc. while they are uncommitted and
> then commit them all (perhaps resolving multiuser multitransaction
> issues on commits). Kind of neat concept, I'll have to consider for some
> version of the Pointrel System.
> 
> I think it is the special syntax of:
>   tx.Update(u1, name='Joe')
> or:
>   tx.Create('User', name='Sir Galahad')
> which I am recoiling some from.
> 
> I think part of this comes from thinking as a transaction as something
> that encloses other changes, as opposed to something which is changed.
> Thus my discomfort at requesting services from a transaction other than
> commit or abandon. I'm not saying maybe I couldn't grow to love
> tx.Update(), just that it seems awkward at first compared to what I am
> used to, as well compared to making operations on a database itself
> after having told the database to begin a transaction.

My use of the term "transaction" has certain subtleties that deserve
clarification.  First, a transaction is an instance of a Transaction
class (or subclass).  This instance must have an execute method that
will get called by the database (after the transaction instance gets
tested for picklability, and gets logged as a pickle).  That execute
method will be passed the root of the database.  It is then free to do
whatever it wants, as long as the sum total of what it does leaves the
database in a consistent state.  All transactions are executed
sequentially.  All changes made by a transaction must be
deterministic, in case the transaction gets reapplied from the
transaction log during a recovery, or restarting a database that
wasn't dumped just prior to stopping.

At this point, PyPerSyst does not have commit/rollback capability.  So
it is up to the transaction class instance to not leave the database
in an inconsistent state.  I'm looking into supporting
commit/rollback, but the simple solution there would double RAM
requirements, and other solutions are tricky, to say the least.  So
I'm still looking for something simple and elegant to fit in with the
rest of the framework.

The transactions I've shown, tx.Create, tx.Update, tx.Delete, are
simply generic classes that come with PyPerSyst to make it easy to
create, update and delete single instances of entities.  Most real
applications would define their own Transaction classes in addition to
these.

> I'm also left wondering what the read value of the "name" field is
> when accessed directly as "u1.name" after doing the "wx.Update()"
> and before doing the "db.execute()".

t = tx.Update() merely creates a transaction instance, providing it
with values that will be needed by its execute() method.  (See the GOF
Command pattern.)  So nothing changes until the transaction is
executed by the database, which happens when the transaction instance
is passed to the database's execute method:

db.execute(t)

> [By the way, pickly, picky, and I fall down on it too, but you use
> different capitalizations for those two functions.]

There aren't two functions: tx.Update is a class, db.execute is a
method.  The capitalization is correct.  ;-)

> So is it that in PyPerSyst there appears to be one way to access
> information (directly through the object using Python object
> attribute access dot syntax) [not sure about database queries?] and
> another way to change objects -- using tx.XYZ()? This mixing of
> mindsets could be confusing (especially within an object that
> changes its own values internally).

You could define transactions that do queries as well.  And some
people prefer to do that.  But I think for most reads it is easier to
traverse the db.root object.

If you use entities, and an instance of the Root class for your
db.root, then your db.root is a dictionary-like object that gets you
to the extent for each Entity subclass in your schema.  The entity
extent is an instance of an Entity class that manages the set of all
instances of the class that it manages.  The Extent class is how I'm
able to provide Relational-like features.

Inside of Entity instances, your code looks just like regular Python
code.  Its just application code that must go through transactions.
Sure this mixing of mindsets is different than what people are used
to, but we're talking about managing valuable data.  If you simplify
things too much, you lose the integrity of your data.

> Using tx.Update also becomes an issue of how to convert existing
> code to persistant code.  Mind you, the Pointrel System can't do
> this transparently either, but it doesn't try to do it at all. The
> Pointrel System requires both looking up a value and storing it to
> use a different syntax. Is it just a matter of aesthetics about
> whether it is better to have the whole approach be unfamiliar or
> whether it is better to have only half of it be unfamiliar? Or is
> there something more here, some violation of programmer
> expectations? [See below.]

Existing code won't become magically persistent by adding PyPerSyst.

> > And unique ids (immutable, btw) are assigned by PyPerSyst:
> >>>> user.oid
> > 42
> 
> Being competetive here :-) I would love to know if you have a good
> approach for making them globally unique across all possible users
> of all PyPerSyst repositories for all time. The Pointrel has an
> approach to handle this (I don't say it will always work, or is
> efficient, but it tries). :-) Feel free to raid that code (BSDish
> license, see license.txt), but that issue may have other deeper
> implications for your system.

Sorry, nothing special here.  They are just incrementing ints unique
within each extent.  It would be easy to switch to a globally unique
id if you have a good one, and as long as it was deterministic, and
not random in any way.

> > And you can still access attributes directly, you just can't
> > change them outside of a transaction:
> >
> >>>> user.name
> > 'Sir Galahad'
> > And the generic Update transaction is equally simple:
> >
> >>>> t = tx.Update(user, name='Brian') db.execute(t) user.name
> > 'Brian'
> 
> I know one rule of user interface design (not nexceesarily API of
> course) is that familiar elements should act familiar (i.e. a drop
> down list should not launch a dialog window on drop down) and that
> if you are going to experiment it should look very different so
> expectations are not violated.
> 
> The issue here is in part that when you can reference "u1.name" and
> then "u1.name = 'Joe'" generates an exception (instead of
> automatically making an implict transaction), some user expectation
> of API symmetry may be violated...

While this is feasible, the problem I have with this is that I think
implicit transactions on this minute level of granularity are evil.
That's the main reason I haven't implemented this, even though others
have done this for PyPerSyst.  I think too many people would abuse the
implicit transaction feature, resulting in inconsistent and unreliable
objects.  I'm targeting serious, multi-user applications.  But
PyPerSyst is completely modular, so you can use it to implement all
kinds of persistence systems.  Most of the capabilities I've been
discussing are new, and completely optional.

> Also, on another issue, it seems like the persistant classes need to
> derive from a special class and define their persistant features in
> a special wy, i.e. class Realm(Entity): _attrSpec = [ 'name', ] etc.
> Again, this is going somewhat towards Python language integration
> yet not all the way.

You don't *have* to use the Entity class that comes with PyPerSyst,
but if you do, it lets you define the attributes, alternate keys, and
fields for your subclass in as simple a form as I could think of.

If you don't use the Entity class, then you have to figure out how to
support instance integrity, alternate keys, referential integrity,
bi-directional references, etc.  So I think they provide some benefit.

> While I'd certainly agree your version is more concise than what I
> posted first (just an example of a system that does not attempt to use
> Python language features), later in the email (perhaps you'll get to it
> in your next reply) was the simpler:
> 
>    import persistanceSystem import *
>    foo = MyClass()
>    PersistanceSystem_Wrap(foo)
>    # the following defaults to a transaction
>    foo.x = 10
>    # this makes a two change transaction
>    PersistanceSystem_StartTransaction()
>    foo.y = 20
>    foo.z = 20
>    foo.info = "I am a 3D Point"
>    PersistanceSystem_EndTransaction()
> 
> That approach does not violate any symmetry expectations by users --
> you can assign and retrieve values just like always.

If users expect symmetry it is because they are used to writing single
process programs that do not share objects.  Does anyone expect this
kind of symmetry and transparency when writing a multi-threaded
application?  Why not?  Granted, having start/end transaction
semantics might change some of the rules.  But even if we had those in
PyPerSyst, I would probably only use them inside of Transaction
classes, not embedded in application code where they are harder to
find and test.  Explicit transaction objects have many benefits.

It's sort of similar to the notion of separating your application
logic from your gui code.  Sure its easier to just put a bunch of code
in the event handler for a button.  But is that the best way to code?
In my mind, implicit transactions, or commit/rollback in application
code, is like putting all your business logic in the event handlers
for your gui widgets.  I'm trying to keep people from writing crappy
persistent applications.

> > PyPerSyst can persist *any* picklable object graph.
> 
> Are the graphs stand alone can they reference other previously
> persisted Python objects (not derived from "Root" or "Entity")?

A PyPerSyst database has a single entry point, named root, that can be
any picklable Python object, and any objects reachable from that
object.  When the root gets pickled (for example when you do
db.dump()), the whole thing gets pickled and all references are
maintained.  When the database starts, the entire thing gets
unpickled.  The entire thing is always in memory (real, or virtual).
The snapshot and log are on disk.  Each transaction is appended to the
log.  Did that answer your question?

> > But it also comes with an Entity class and a Root class (that
> > understands Entity classes) that provides additional functionality,
> > such as alternate indexes, referential integrity, instance
> > validation, etc.
> 
> I guess I need to learn more about when these are better handled by
> the persistance system as opposed to the applications that use it.

In my mind, anything that is generic shouldn't have to be reinvented
in application code.  I feel like I've spent most of my career
reinventing one database application after another.  ;-)

> Presumably a very transaparent API for persistance is still needed
> for an ODBMS which is Python friendly? (Does ZODB do any of this?)

I started writing a wrapper for ZODB and gave up about a year ago.

> If I need to write any extra code at all for an object to be
> persistant, or derive from a specialized class, I could just derive
> from a class that knows how to use SQL to store pickled fields.

You don't think there is a benefit to not having to use a database,
not having to map anything to relational tables, not being limited to
the relational model, and not having to do joins, etc?  I don't care
how good an O-R mapper is, not having to use one at all is better.

> I think this is the core of the question of this part of the thread.
> You wrote "I've come to think otherwise". I'd be curious to hear
> more on any use cases or examples on why transaparency is not so
> compatible with reliability etc.

I just think implicit transparent transactions would lull users into a
false sense of integrity and make them write sloppy applications that
didn't actually maintain the integrity of their objects when used in a
multi-user environment.  I think the kind of applications I want to
use PyPerSyst for demand that it be difficult for application
programmers to do the wrong thing with regards to the integrity of the
persisted data.  I think having transactions as explicit objects
provides more control over the integrity of the database.  If users
want transparency, it can be done, using PyPerSyst, it just isn't the
focus of my current efforts.  And I don't think explicit transactions
are that much of a burden.  Transaction code is a small percentage of
application code, compared to all the interface code you have to
write.  And you could easily write wrappers for transactions that make
them less burdensome.

-- 
Patrick K. O'Brien
Orbtech      http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------